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ETAPS Foreword 


Welcome to the 22nd ETAPS! This is the first time that ETAPS took place in the Czech 
Republic in its beautiful capital Prague. 

ETAPS 2019 was the 22nd instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. 
Each conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from theo- 
retical computer science to foundations to programming language developments, 
analysis tools, formal approaches to software engineering, and security. 

Organizing these conferences in a coherent, highly synchronized conference pro- 
gram enables participation in an exciting event, offering the possibility to meet many 
researchers working in different directions in the field and to easily attend talks of 
different conferences. ETAPS 2019 featured a new program item: the Mentoring 
Workshop. This workshop is intended to help students early in the program with advice 
on research, career, and life in the fields of computing that are covered by the ETAPS 
conference. On the weekend before the main conference, numerous satellite workshops 
took place and attracted many researchers from all over the globe. 

ETAPS 2019 received 436 submissions in total, 137 of which were accepted, 
yielding an overall acceptance rate of 31.4%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2019 featured the unifying invited speakers Marsha Chechik (University of 
Toronto) and Kathleen Fisher (Tufts University) and the conference-specific invited 
speakers (FoSSaCS) Thomas Colcombet (IRIF, France) and (TACAS) Cormac 
Flanagan (University of California at Santa Cruz). Invited tutorials were provided by 
Dirk Beyer (Ludwig Maximilian University) on software verification and Cesare 
Tinelli (University of Iowa) on SMT and its applications. On behalf of the ETAPS 
2019 attendants, I thank all the speakers for their inspiring and interesting talks! 

ETAPS 2019 took place in Prague, Czech Republic, and was organized by Charles 
University. Charles University was founded in 1348 and was the first university in 
Central Europe. It currently hosts more than 50,000 students. ETAPS 2019 was further 
supported by the following associations and societies: ETAPS e.V., EATCS (European 
Association for Theoretical Computer Science), EAPLS (European Association for 
Programming Languages and Systems), and EASST (European Association of Soft- 
ware Science and Technology). The local organization team consisted of Jan Vitek and 
Jan Kofron (general chairs), Barbora Buhnova, Milan Ceska, Ryan Culpepper, Vojtech 
Horky, Paley Li, Petr Maj, Artem Pelenitsyn, and David Safranek. 


vi ETAPS Foreword 


The ETAPS SC consists of an Executive Board, and representatives of the 
individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and 
EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns 
(Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Liittgen 
(Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Reykjavik and 
Tallinn), and Lenore Zuck (Chicago). Other members of the SC are: Wil van der Aalst 
(Aachen), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Armin Biere (Linz), 
Luis Caires (Lisbon), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), 
Jurriaan Hage (Utrecht), Rainer Hahnle (Darmstadt), Reiko Heckel (Leicester), 
Panagiotis Katsaros (Thessaloniki), Barbara König (Duisburg), Kim G. Larsen 
(Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Peter Miiller 
(Zurich), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), 
Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Dave Sands (Gothenburg), 
Don Sannella (Edinburgh), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), 
Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), Heike Wehrheim 
(Paderborn), Anton Wijs (Eindhoven), and Lijun Zhang (Beijing). 

I would like to take this opportunity to thank all speakers, attendants, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoy the 
proceedings of ETAPS 2019. Finally, a big thanks to Jan and Jan and their local 
organization team for all their enormous efforts enabling a fantastic ETAPS in Prague! 


February 2019 Joost-Pieter Katoen 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


This volume contains the papers presented at the 22nd International Conference on 
Fundamental Approaches to Software Engineering (FASE 2019) held during April 
9-11, 2019, in Prague. FASE 2019 was organized as part of the annual European Joint 
Conferences on Theory and Practice of Software (ETAPS 2019). ETAPS is the most 
important and visible annual European event related to software sciences. 

As usual, the papers submitted to FASE focus on the foundations on which software 
engineering is built. The papers submitted covered topics such as software engineering, 
requirements engineering, software architectures, specification, software quality, 
validation, verification of functional and non-functional properties, model-driven 
development and model transformation, model transformations, software processes, 
and software evolution. 

We received 94 abstract submissions of which 74 were turned into full submissions 
(63 research papers, five tool papers, and six demo papers). We had submissions from 
the following countries (sorted based on the number of submissions): Germany, France, 
Canada, Estonia, USA, Argentina, UK, Norway, Spain, Brazil, China, South Korea, 
Australia, Czechia, Austria, Denmark, Italy, Japan, the Netherlands, Pakistan, 
South Africa, Tunisia, India, Poland, Portugal, Romania, Turkey, Belgium, Colombia, 
Macedonia, Malta, Sweden, and Ukraine. 

Of the 74 submitted papers, 24 papers were accepted after reviewing and discus- 
sions among the Program Committee (PC) members (20 research papers, two tool 
papers, and two demo papers). This corresponds to a 32% acceptance rate. Beside the 
30 PC members, there were 100 external reviewers. For the fourth time, FASE used a 
double-blind reviewing process. Overall the reviewing process was smooth and it was 
possible to have consensus on all decisions. We thank the PC members and reviewers 
for doing a great job! 

Apart from thanking the authors, we also thank Marsha Chechik (University of 
Toronto) for contributing a paper based on her plenary ETAPS 2019 invited talk, which 
is also included in these proceedings. The title of Marsha’s talk was “Software 
Assurance in an Uncertain World.” She discussed the problem that software systems 
are deeply rooted in uncertainty since most complex open-world functionality is either 
not completely specifiable or it is not cost-effective to do so. Moreover, these systems 
are placed in an uncertain ever-evolving environment. 

This volume shows that, despite the rapid progress in software engineering, there are 
still many open problems. These problems are important for the way we do business, 
the way we govern, and the way we socialize. We depend on complex software 
artifacts, yet we still need to fully understand how to best develop and maintain them. 
The papers in this volume help to progress the state of the art and hopefully inspire and 
influence future work. 

We thank the ETAPS 2019 organizers, in particular, Jan Kofron and Jan Vitek 
(general chairs), Barbora Buhnova (publicity chair), Vojtech Horkey and Arten 
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Pelnisyn (web chairs), and David Safranek (publications chair). We also thank 
Joost-Pieter Katoen, the ETAPS SC chair, for managing the whole process, and 
Gabriele Taentzer, the FASE SC chair, for swift feedback on several questions. 

We hope that you will enjoy reading the volume. 


February 2019 Wil van der Aalst 
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World 
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Sahar Kokaly, and Mona Rahimi 


University of Toronto, Toronto, Canada 
chechik@cs.toronto.edu 


Abstract. From financial services platforms to social networks to vehi- 
cle control, software has come to mediate many activities of daily life. 
Governing bodies and standards organizations have responded to this 
trend by creating regulations and standards to address issues such as 
safety, security and privacy. In this environment, the compliance of soft- 
ware development to standards and regulations has emerged as a key 
requirement. Compliance claims and arguments are often captured in 
assurance cases, with linked evidence of compliance. Evidence can come 
from testcases, verification proofs, human judgment, or a combination 
of these. That is, experts try to build (safety-critical) systems carefully 
according to well justified methods and articulate these justifications in 
an assurance case that is ultimately judged by a human. Yet software 
is deeply rooted in uncertainty; most complex open-world functional- 
ity (e.g., perception of the state of the world by a self-driving vehicle), 
is either not completely specifiable or it is not cost-effective to do so; 
software systems are often to be placed into uncertain environments, 
and there can be uncertainties that need to be We argue that the role of 
assurance cases is to be the grand unifier for software development, focus- 
ing on capturing and managing uncertainty. We discuss three approaches 
for arguing about safety and security of software under uncertainty, in 
the absence of fully sound and complete methods: assurance argument 
rigor, semantic evidence composition and applicability to new kinds of 
systems, specifically those relying on ML. 


1 Introduction 


From financial services platforms to social networks to vehicle control, software 
has come to mediate many activities of daily life. Governing bodies and standards 
organizations have responded to this trend by creating regulations and standards 
to address issues such as safety, security and privacy. In this environment, the 
compliance of software development to standards and regulations has emerged 
as a key requirement. 

Development of safety-critical systems begins with hazard analysis, aimed to 
identify possible causes of harm. It uses severity, probability and controllability 
of a hazard’s occurrence to assign the Safety Integrity Levels (in the automo- 
tive industry, these are referred to as ASILs [35]) — the higher the ASIL level, 
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the more rigor is expected to be put into identifying and mitigating the hazard. 
Mitigating hazards therefore becomes the main requirement of the system, with 
system safety requirements being directly linked to the hazards. These require- 
ments are then refined along the LHS of the V until individual modules and their 
implementation can be built. The RHS includes appropriate testing and valida- 
tion, used as supporting evidence in developing an argument that the system 
adequately handles its hazards, with the expectation that the higher the ASIL 
level, the stronger the required justification of safety is. 

Assurance claims and arguments are often captured by assurance cases, with 
linked evidence supporting it. Evidence can come from testcases, verification 
proofs, human judgment, or a combination of these. Assurance cases organize 
information allowing argument unfolding in a comprehensive way and ultimately 
allowing safety engineers to determine whether they trust that the system was 
adequately designed to avoid systematic faults (before delivery) and adequately 
detect and react to failures at runtime [35]. 

Yet software is deeply rooted in uncertainty; most complex open-world func- 
tionality (e.g., perception of the state of the world by a self-driving vehicle), 
is either not completely specifiableor it is not cost-effective to do so [12]. Soft- 
ware systems are often to be placed into uncertain environments [48], and there 
can be uncertainties that need to be considered at the design phase [20]. Thus, 
we believe that the role of assurance cases is to explicitly capture and manage 
uncertainty coming from different sources, assess it and ultimately reduce it to an 
acceptable level, either with respect to a standard, company processes, or asses- 
sor judgment. The various software development steps are currently not well 
integrated, and uncertainty is not expressed or managed explicitly in a uniform 
manner. Our claim in this paper is that an assurance case is the unifier among 
the different software development steps, and can be used to make uncertainties 
explicit, which also makes them manageable. This provides a well-founded basis 
for modeling confidence about satisfaction of a critical system quality (security, 
safety, etc.) in an assurance case, making assurance cases play a crucial role 
in software development. Specifically, we enumerate sources of uncertainty in 
software development. We also argue that organizing software development and 
analysis activities around the assurance case as a living document allows all parts 
of the software development to explicitly articulate uncertainty, steps taken to 
manage it, and the degree of confidence that artifacts acting as evidence have 
been performed correctly. This information can then help potential assessors in 
checking that the development outcome adequately satisfies the software desired 
quality (e.g., safety). 

The area of system dependability has produced a significant body of work 
describing how to model assurance cases (e.g., [4,5,14,38]), and how to assess 
reviewer’s confidence in the argument being made (e.g., [16,31,45,59,60]). There 
is also early work on assessing the impact of change on the assurance argument 
when the system undergoes change [39]. A recent survey [43] provides a com- 
prehensive list of assurance case tools developed over the past 20 years and 
an analysis of their functionalities including support for assurance case creation, 
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assessment and maintenance. We believe that the road to truly making assurance 
cases the grand unifier for software development for complex high-assurance sys- 
tems has many challenges. One is to be able to successfully argue about safety 
and security of software under uncertainty, without fully sound and complete 
methods. For that, we believe that assurance arguments must be rigorous and 
that we need to properly understand how to perform evidence composition for 
traditional systems, but also for new kinds of systems, specifically those relying 
on ML. We discuss these issues below. 


Rigor. To be validated or reused, assurance case structures must be as rigorous 
as possible [51]. Of course, assurance arguments ultimately depend on human 
judgment (with some facts treated as “obvious” and “generally acceptable”), 
but the structure of the argument should be fully formal so as to allow to assess 
its completeness. Bandur and McDermid called this approach “formal modulo 
engineering expertise” [1]. 


Evidence Composition. We need to effectively combine the top-down process 
of uncertainty reduction with the bottom-up process of composing evidence, 
specifically, evidence obtained from applying testing and verification techniques. 


Applicability to “new” kinds of systems. We believe that our view — rig- 
orous, uncertainty-reduction focused and evidence composing — is directly appli- 
cable to systems developed using machine learning, e.g., self-driving cars. 

This paper is organized as follows: In Sect. 2, we briefly describe syntax of 
assurance cases. In Sect. 3, we outline possible sources of uncertainty encountered 
as part of system development. In Sect. 4, we describe the benefits of a rigorous 
language for assurance cases by way of example. In Sect. 5, we describe, again by 
way of example, a possible method of composing evidence. In Sect. 6, we develop 
a high-level assurance case for a pedestrian detection subsystem. We conclude 
in Sect. 7 with a discussion of possible challenges and opportunities. 


2 Background on Assurance Case Modeling Notation 


The most commonly used representation for safety cases is the graphical Goal 
Structuring Notation (GSN) [30], which is intended to support the assurance of 
critical properties of systems (including safety). GSN is comprised of six core 
elements — see Fig. 1. Arguments in GSN are typically organized into a tree 
of the core elements shown in Fig.1!. The root is the overall goal to be sat- 
isfied by the system, and it is gradually decomposed (possibly via strategies) 
into sub-goals and finally into solutions, which are the leaves of the safety case. 
Connections between goals, strategies and solutions represent supported-by rela- 
tions, which indicate inferential or evidential relationships between elements. 
Goals and strategies may be optionally associated with some contexts, assump- 
tions and/or justifications by means of in-context-of relations, which declare a 
contextual relationship between the connected elements. 


1 In this paper, we use both diamond and triangle shapes interchangeably to depict 
an “undeveloped” element. 
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Fig. 1. Core GSN elements from [30]. 
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Fig. 2. Example safety case in GSN (from [30]). 


For example, consider the safety case in Fig. 2. The overall goal G1 is that 
the “Control System is acceptably safe to operate” given its role, context and 
definition, and it is decomposed into two sub-goals: G2, for eliminating and mit- 
igating all identified hazards, and G3, for ensuring that the system software is 
developed to an appropriate ASIL. Assuming that all hazards have been iden- 
tified, G2 can in turn be decomposed into three sub-goals by considering each 
hazard separately (S1), and each separate hazard is shown to be satisfied using 
evidence from formal verification (Sn1) or fault tree analysis (Sn2). Similarly, 
under some specific context and justification, G3 can be decomposed into two 
sub-goals, each of which is shown to be satisfied by the associated evidence. 


3 Sources of Uncertainty in Software Development 


In this section, we briefly survey uncertainty in software development, broadly 
split into the categories of uncertainties about the specifications, about the envi- 
ronment, about the system itself, and about the argument of its safety. For each 
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part, we aim to address how building an assurance case is related to understand- 
ing and mitigating such uncertainties. 


Uncertainty in Specifications. Software specifications tend to suffer from 
incompleteness, inconsistency and ambiguity [42,46]. Specification uncertainty 
stems from a misunderstanding or an incomplete understanding of how the sys- 
tem is supposed to function in early phases of development; e.g., miscommuni- 
cation and inability of stakeholders to transfer knowledge due to differing con- 
cepts and vocabularies [2,13]; unknown values for sets of known events (a.k.a. 
the known unknowns); and the unknown and unidentifiable events (a.k.a. the 
unknown unknowns) [57]. 

Recently, machine-learning approaches for interactively learning the software 
specifications have become popular; we discuss one such example, of pedestrian 
detection, in Sect.6. Other mitigations of specification uncertainties, suggested 
by various standards and research, are identification of edge cases [36], hazard 
and obstacle analysis [55] to help identify unknown unknowns [35], step-wise 
refinement to handle partiality in specifications, ontology- [9] and information 
retrieval-driven requirements engineering approaches [21], as well as generally 
building arguments about addressing specification uncertainties. 


Environmental Uncertainty. The system’s environment can refer to adjacent 
agents interacting with the system, a human operator using the system, or phys- 
ical conditions of the environment. Sources of environmental uncertainties have 
been thoroughly investigated [19,48]. One source originates from unpredictable 
and changing properties of the environment, e.g., assumptions about actions of 
other vehicles in the autonomous vehicle domain or assuming that a plane is 
on the runway if its wheels are turning. Another uncertainty source is input 
errors from broken sensors, missing, noisy and inaccurate input data, imprecise 
measurements, or disruptive control signals from adjacent systems. Yet another 
source might be when changes in the environment affect the specification. For 
example, consider a robotic arm that moves with the expected precision but the 
target has moved from its estimated position. 

A number of techniques have been developed to mitigate environmental 
uncertainties, e.g., runtime monitoring systems such as RESIST [10], or machine- 
learning approaches such as FUSION [18] which self-tune the adaptive behavior 
of systems to unanticipated changes in the environment. More broadly, environ- 
mental uncertainties are mitigated by a careful requirements engineering process, 
by principled system design and, in assurance cases, by an argument that they 
had been adequately identified and adequately handled. 


System Uncertainties. One important source of uncertainty is faced by devel- 
opers who do not have sufficient information to make decisions about their sys- 
tem during development. For example, a developer may have insufficient infor- 
mation to choose a particular implementation platform. In [19,48], this source 
of uncertainty is referred to as design-time uncertainty, and some approaches to 
handling it are offered in [20]. Decisions made while resolving such uncertain- 
ties are crucial to put into an assurance argument, to capture the context, i.e., 
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a particular platform is selected because of its performance, at the expense of 
memory requirements. 

Another uncertainty refers to correctness of the implementation [7]. This 
uncertainty lays in the V&V procedure and is caused by whether the imple- 
mentation of the tool can be trusted, whether the tool is used appropriately 
(that is, its assumptions are satisfied), and in general, whether a particular ver- 
ification technique is the right one for verifying the fulfillment of the system 
requirements [15]. We address some of these uncertainties in Sect. 5. 


Argument Uncertainty. The use of safety arguments to demonstrate safety 
of software-intensive systems raises questions such as the extent to which these 
arguments can be trusted. That is, how confident are we that a verified, validated 
software is actually safe? How much evidence and how thorough of an argument 
do we require for that? 

To assess uncertainties which may affect the system’s safety, researchers have 
proposed techniques to estimate confidence in structured assurance cases, either 
through qualitative or quantitative approaches [27,44]. The majority of these are 
based on the Dempster-Shafer Theory [31,60], Josang’s Opinion Triangle [17], 
Bayesian Belief Networks (BNNs) [16,61], Evidential Reasoning (ER) [45] and 
weighted averages [59]. The approaches which use BBNs treat safety goals as 
nodes in the network and try to compute their conditional probability based on 
given probabilities for the leaf nodes of the network. Dempster-Shafer Theory is 
similar to BBNs but is based on the belief function and its plausibility which is 
used to combine separate pieces of information to calculate the probability. The 
ER approach [45] allows the assessors to provide individual judgments concerning 
the trustworthiness and appropriateness of the evidence, building a separate 
argument from the assurance case. 

These approaches focus on assigning and propagating confidence measures 
but do not specifically address uncertainty in the argument. They also focus on 
aggregating evidence coming from multiple sources but treat it as a “black box”, 
instead of how a piece of evidence from one source might compose with another. 
We look at these questions in Sects. 4 and 5, respectively. 


4 Formality in Assurance Cases 


As discussed in Sect. 1, we believe that the ultimate goal of an assurance case 
is to explicitly capture and manage uncertainty, and ultimately reduce it to an 
acceptable level. Even informal arguments improve safety, e.g., by making peo- 
ple decompose the top level goal case-wise, and examine the decomposed parts 
critically. But the decomposed cases tend to have an ad hoc structure dictated 
by experience and preference, with under-explored completeness claims, giving 
both developers and regulators a false sense of confidence, no matter how con- 
fidence is measured, since they feel that their reasoning is rigorous even though 
it is not [58]. Moreover, as assurance cases are produced and judged by humans, 
they are typically based on inductive arguments. Such arguments are susceptible 
to fallacies (e.g., arguing through circular reasoning, using justification based 
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Fig. 3. A fragment of the Lane Management (LMS) Safety case. 


on false dichotomies), and evaluations by different reviewers may lead to the 
discovery of different fallacies [28]. 

There have been several attempts to improve credibility of an argument 
by making the argument structure more formal. [25] introduces the notion of 
confidence maps as an explicit way of reasoning about sources of doubt in an 
argument, and proposes justifying confidence in assurance arguments through 
eliminative induction (i.e., an argument by eliminating sources of doubt). [29] 
highlights the need to model both evidential and argumentation uncertainties 
when evaluating assurance arguments, and considers applications of the formally 
evaluatable extension of Toulmin’s argument style proposed by [56]. [11] details 
VAA — a method for assessing assurance arguments based on Dempster-Shafer 
theory. [51] is a proponent of completely deductive reasoning, narrowing the 
scope of the argument so that it can be formalized and potentially formally 
checked, using automated theorem provers, arguing that this would give a mod- 
ular framework for assessing (and, we presume, reusing) assurance cases. [1] 
relaxes Rushby’s position a bit, aiming instead at formal assurance argumen- 
tation “modulo engineering expertise”, and proof obligations about consistency 
of arguments remain valid even for not fully formal assurance arguments. To 
this end, they provided a specific formalization of goal validity given valid- 
ity of subgoals and contexts/context assumptions, resulting in such rules as 
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“assumptions on any given element must not be contradictory nor contradict 
the context assumed for that goal” [1]. 


Our Position. We believe that a degree of formality in assurance cases can go 
a long way not only towards establishing its validity, identifying and framing 
implicit uncertainties and avoiding fallacies, but also supporting assurance case 
modularity, refactoring and reuse. We illustrate this position on an example. 


Example. Consider two partially developed assurance cases that argue that the 
lane management system (LMS) of a vehicle is safe (Figs. 3 and 4). The top-level 
safety goal G1 in Fig. 3 is first decomposed by the strategy Str1 into a set of 
subgoals which assert the safety of the LMS subsystems. An assessor can only 
trust that goals G2 and G3 imply G1 by making an implicit assumption that 
the system safety is completely determined by the safety of its individual subsys- 
tems. Neither the need for this assumption nor the credibility of the assumption 
itself are made explicit in the assurance case, which weakens the argument and 
complicates the assessment process. The argument is further weakened by the 
absence of a completeness claim that all subsystems have been covered by this 
decomposition. 

Strategies Str2 and Str3 in Fig. 3 decompose the safety claims about each 
subsystem into arguments over the relevant hazards. Yet the hazards themselves 
are never explicitly stated in the assurance case, making the direct relevance of 
each decomposed goal to its corresponding parent goal, and thus to the argument 
as a whole, unclear. While goals G6 and G9 attempt to provide completeness 
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claims for their respective decompositions, they do so by citing lack of negative 
evidence without describing efforts to uncover such evidence. This justification 
is fallacious and can be categorized as “an argument from ignorance” [28]. 

Now consider the assurance case in Fig. 4 which presents a variant of the argu- 
ment in Fig. 3, refined with context nodes, justification nodes and completeness 
claims. The top-level goal G1 is decomposed into a set of subgoals asserting 
that particular hazards have been mitigated, as well as a completeness claim 
G3C stating that hazards H1 and H2 are the only ones that may be prevalent 
enough to defeat claim G1. Context nodes C1 and C2 define the hazards them- 
selves, which clarifies the relevance of each hazard-mitigating goal. The node J1 
provides a justification for the validity of Str1 by framing the decomposition 
as a proof by (exhaustive) cases. That is, Str1 is justified by the statement 
that if H1 and H2 are the only hazards that could potentially make the system 
unsafe, then the system is safe if H1 and H2 have been adequately mitigated. 
This rigorous argument can be represented by the logical expression GBC => 
((G2 A G4) = > G1), and if completeness holds then G2 and G4 are suf- 
ficient to show G1. We now have a rigorous argument step that our confidence 
in G1 is a direct consequence of confidence in its decomposed goals G2, G3C 
and G4, even though there may still be uncertainty in the evidential evaluation 
of G2, G3C and G4. That is, uncertainty has been made explicit and can be 
reasoned about at the evidential level. By removing argumentation uncertainty 
and explicating implicit assumptions, we get a more comprehensive framework 
for assurance case evaluation, where the relation between all reasoning steps is 
formally clear. Note that if the justification provides an inference rule, then the 
argument becomes deductive. Otherwise, it is weaker (the justification node can 
be used to quantify just how weaker) but still rigorous. 

While the completeness claim G3C in Fig. 4 may be directly supported by 
evidence, the goals G2 and G4 are further decomposed by the strategies Str2 
and Str3, respectively, which represent decompositions over subsystems. These 
strategies are structured similarly to Str1, and can be expressed by the logical 
expressions G7C = > ((G5 A G6) = > G2) and G10C => ((G8 ^ G9) 
= G4), respectively. In Fig.3, a decomposition by subsystems was applied 
directly to the top-level safety goal which necessitated a completeness claim 
that the safety of all individual subsystems implied safety of the entire system. 
Instead, the argument in Fig. 4 only needs to show that the set of subsystems in 
each decomposition is complete w.r.t. a particular hazard, which may be a more 
feasible claim to argue. This ability to transform an argument into a more easily 
justifiable form is another benefit of arguing via rigorous reasoning steps. 


5 Combining Evidence 


Evidence for assurance cases can come from a variety of sources: results from 
different testing and verification techniques, human judgment, or their combina- 
tion. Multiple testing and verification techniques may be used to make the evi- 
dence more complete. A verification technique complements another if it is able 
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Fig. 5. Confidence argument for code review workflow (from [6]). 


to verify types of requirements which cannot be verified by the other technique. 
For example, results of verification of properties via a bounded model checker 
(BMC) are complemented by additional test cases [8]. A verification technique 
supports another if it is used to detect faults in the other’s verification results, 
thus providing backing evidence [33]. For example, a model checking technique 
may support a static analysis technique by verifying the faults detected [6]. Note 
that these approaches are principally different from just aggregating evidence 
treating it as a blackbox! 

Habli and Kelly [32] and Denney and Pai [15] present safety case patterns 
for the use of formal method results for certification. Bennion et al. [3] present a 
safety case for arguing the compliance of a particular model checker, namely, the 
Simulink Design Verifier for DO-178C. Gallina and Andrews [23] argue about 
adequacy of a model-based testing process, and Carlan et al. [7] provide a safety 
pattern for choosing and composing verification techniques based on how they 
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contribute to the identification or mitigation of systematic faults known to affect 
system safety. 


Our Position. We, as a community, need to figure out the precise conditions 
under which particular testing and verification techniques “work” (e.g., model- 
ing floating-point numbers as reals, making a small model hypothesis to justify 
sufficiency of a particular loop unrolling, etc.), and how they are intended to 
be composed in order to reduce uncertainty about whether software satisfies its 
specification. We illustrate a particular composition here. 


Example. In this example, taken from [6], a model checker supports static 
analysis tools (that produce false negatives) by verifying the detected faults [6]. 
The assurance case is based on a workflow (not shown here) where an initial 
review report is constructed, by running static analysis tools and possibly peer 
code reviews. Then the program is annotated with the negation of each potential 
erroneous behavior as a desirable property for the program, and given to a 
model-checker. If the model-checker is able to verify the property, it is removed 
from the initial review report and not considered as an error. If the model- 
checker finds a violation, the alleged error is confirmed. In this case, a weakest- 
precondition generation mechanism is applied to find out the environmental 
conditions (external parameters that are not under the control of the program) 
under which the program shows the erroneous behavior. These conditions and 
the error trace are then added to the error description. 

The paper [6] presents both the assurance case and the confidence argument 
for the code review workflow. We reproduce only the latter here (see Fig. 5), 
focusing on reducing uncertainty about the accuracy and consistency of the code 
property (goal G2). False positives generated by static analysis are mitigated 
using BMC — a method with a completely different verification rationale, thus 
implementing the safety engineering principle of independence (J2). Strategy 
(Str2) explains how errors can be confirmed or dismissed using BMC (goal 
G6). The additional information given by BMC can be used for the mitigation 
of the error (C2). 

This approach takes good steps towards mitigating particular assurance 
deficits using a composition of verification techniques but leaves open several 
problems: how to ensure that BMC runs under the same environmental condi- 
tions as the static analysis tools? how deeply should the loops be unrolled? what 
to do with cases when the model-checker runs out of resources without giving 
a conclusive answer? and in general, what are the conditions under which it is 
safe to trust the “yes” answers of the model-checker. 


6 Assurance Cases for ML Systems 


Academia and industry are actively building systems using AI and machine 
learning, including a rapid push for ML in safety-critical domains such as medical 
devices and self-driving cars. For their successful adoption in society, we need to 
ensure that they are trustworthy, including obtaining confidence in their behavior 
and robustness. 
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Fig. 6. A partially developed GSN safety case of pedestrian detector example. 


Significant strides have already been made in this space, from extend- 
ing mature testing and verification techniques to reasoning about neural net- 
works [24,37,47,54] for properties such as safety, robustness and adequate han- 
dling of adversarial examples [26,34]. There is active work in designing systems 
that balance learning under uncertainty and acting safely, e.g., [52] as well as 
the broad notion of fairness and explainability in AI, e.g., [49]. 


Our Position. We believe that assurance cases remain a unifying view for ML- 
based systems just as much as for more conventional systems, allowing us to 
understand how the individual approaches fit into the overall goal of assuring 
safety and reliability and where there are gaps. 


Example. We illustrate this idea with an example of a simple pedestrian detec- 
tor (PD) component used as part of an autonomous driving system. The func- 
tions that PD supports consist of detection of objects in the environment ahead 
of the vehicle, classification of an object as a pedestrian or other, and localiza- 
tion of the position and extent of the pedestrian (indicated by bounding box). 
We assume that PD is implemented as a convolutional deep neural network 
with various stages to perform feature extraction, proposing regions containing 
objects and classification of the proposed objects. This is a typical approach for 
two-stage object detectors (e.g., see [50]). 
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Fig. 7. A framework for factors affecting perceptual uncertainty (source: [12]). 


As part of a safety critical system, PD contributes to the satisfaction of a 
top-level safety goal requiring that the vehicle always maintain a safe distance 
from all pedestrians. Specific safety requirements for PD can be derived from 
this goal, such as (RQ1) PD misclassification rate (i.e., classifying a pedestrian 
as “other”) must be less than pmc, (RQ2) PD false positive rate (i.e., classifying 
any non-pedestrian object or non-object as “pedestrian” ) must be less than pfp, 
and (RQ3) PD missed detection rate (i.e., missing the presence of pedestrian) 
must be less than pma. Here, the parameters pmc, Pfp and pma must be derived 
in conjunction with the control system that uses the output from PD to plan 
the vehicle trajectory. 

The partially developed safety case for PD is shown in Fig. 6. The three safety 
requirements are addressed via the strategy Str1 and, as expected, testing results 
are given as evidence of their satisfaction. However, since testing can only provide 
limited assurance about the behaviour of PD in operation, we use an additional 
strategy, Str2, to argue that a rigorous method was followed to develop PD. 
Specifically, we follow the framework of [12] for identifying the factors that lead 
to uncertainty in ML-based perceptual software such as PD. 

The framework is defined at a high level in Fig. 7. The left “perception trian- 
gle” shows how the perceptual concept (in the case of PD, the concept “pedes- 
trian”) can occur in various scenarios in the world, how it is detected using 
sensors such as cameras, and how this can be used to collect and label exam- 
ples in order to train an ML component to learn the concept. The perception 
triangle on the right is similar but shows how the trained ML component can be 
used during the system operation to make inferences (e.g., perform the pedes- 
trian detection). The framework identifies seven factors that could contribute to 
uncertainty in the behaviour of the perceptual component. A safety case demon- 
strating a rigorous development process should provide evidence that each factor 
has been addressed. 

In Fig. 6, strategy Str2 uses the framework to argue that the seven factors are 
adequately addressed for PD. We illustrate development of two of these factors 
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here. Scenario coverage (Goal G-F2) deals with the fact that the training data 
must represent the concept in a sufficient variety of scenarios in which it could 
occur in order for the training to be effective. The argument here first decom- 
poses this goal into different types of variation (Str3) and provides appropriate 
evidence for each. The adequacy of age and ethnicity variation in the data set is 
supported by census data (S2) about the range of these dimensions of variation 
in the population. The variation in the pedestrian pose (i.e., standing, leaning, 
crouching, etc.) is supplied by a standard ontology of human postures (S3). 
Finally, evidence that the types are adequate to provide sufficient coverage of 
variation (completeness) is provided by an expert review (S4). 

Another contributing factor developed in Fig.6 is model uncertainty (Goal 
G-F6). Since there is only finite training data, there can be many possible models 
that are equally consistent with the training data, and the training process could 
produce any one of them, i.e., there is residual uncertainty whether the produced 
model is in fact correct. The presence of model uncertainty means that while the 
trained model may perform well on inputs similar to the training data, there is no 
guarantee that it will produce the right output for other inputs. Some evidence of 
good behaviour here can be gathered if there are known properties that partially 
characterize the concept and can be checked. For example, a reasonable necessary 
condition for PD is that the object being classified as a pedestrian should be 
less than 9 ft tall. Another useful property type is an invariant, e.g., a rotated 
pedestrian image is still a pedestrian. Tools for property checking of neural 
networks (e.g., [37]) can provide this kind of evidence (S5). Another way to 
deal with model uncertainty is to estimate it directly. Bayesian deep learning 
approaches [22] can do this by measuring the degree of disagreement between 
multiple trained models that are equally consistent with the training data. The 
more the models are in agreement are about how to classify a new input, the 
less model uncertainty is present and the more confident one can be in the 
prediction. Using this approach on a test data set can provide evidence (S6) 
about the degree of model uncertainty in the model. This approach can also 
be used during the operation to generate a confidence score in each prediction 
and use a fault tolerance strategy that takes a conservative action when the 
confidence falls below a threshold. 


7 Summary and Future Outlook 


In this paper, we tried to argue that an assurance case view on establishing 
system correctness provides a way to unify different components of the soft- 
ware development process and to explicitly manage uncertainty. Furthermore, 
although our examples came from the world of safety-critical automotive sys- 
tems, the assurance case view is broadly applicable to a variety of systems, not 
just those in the safety-critical domain and includes those constructed by non- 
traditional means such as ML. This view is especially relevant to much of the 
research activity being conducted by the ETAPS community since it allows, in 
principle, to understand how each method contributes to the overall problem of 
system assurance. 
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Most traditional assurance methods aim to build an informal argument, ulti- 
mately judged by a human. However, while these are useful for showing compli- 
ance to standards and are relatively easy to construct and read, such arguments 
may not be rigorous, missing essential properties such as completeness, indepen- 
dence, relevance, or a clear statement of assumptions [51]. As a result, fallacies in 
existing assurance cases are present in abundance [28]. To address this weakness, 
we argued that building assurance cases should adhere to systematic principles 
that ensure rigor. Of course, not all arguments can be fully deductive since rel- 
evance and admissibility of evidence is often based on human judgment. Yet, 
an explicit modeling and management of uncertainty in evidence, specifications 
and, assumptions as well as the clear justification of each step can go a long way 
toward making such arguments valid, reusable, and generally useful in helping 
produce high quality software systems. 


Challenges and Opportunities. Achieving this vision has a number of chal- 
lenges and opportunities. In our work on impact assessment of model change on 
assurance cases [39,40], we note that even small changes to the system may have 
significant impact on the assurance case. Because creation of an assurance case 
is costly, this brittleness must be addressed. One opportunity here is to recog- 
nize that assurance cases can be refactored to improve their qualities without 
affecting their semantics. For example, in Sect. 4, we showed that the LMS safety 
claim could either be decomposed first by hazards and then by subsystems or 
vice versa. Thus, we may want to choose the order of decomposition based on 
other goals, e.g., to minimize the impact of change on the assurance case by 
pushing the affected subgoals lower in the tree. Another issue is that complex 
systems yield correspondingly complex assurance cases. Since these must ulti- 
mately be judged by humans, we must manage the cognitive load the assurance 
case puts on the assessor. This creates opportunities for mechanized support, 
both in terms of querying, navigating and analyzing assurance cases as well as 
in terms of modularization and reuse of assurance cases. 

Evidence composition discussed in Sect. 5 also presents significant challenges. 
While standards such as DO-178C and ISO26262 give recommendations on the 
use of testing and verification, it is not clear how to compose partial evidence or 
how to use results of one analysis to support another. Focusing on how each tech- 
nique reduces potential faults in the program, clearly documenting their context 
of applicability (e.g., the small model hypothesis justifying partial unrolling of 
loops, properties not affected by approximations of complex program operations 
and datatypes often done by model-checkers, connections between the modeled 
and the actual environment, etc.) and ultimately connecting them to reducing 
uncertainties about whether the system satisfies the essential property are keys 
to making tangible progress in this area. 

Finally, in Sect. 6, we showed how the assurance case view could apply to new 
development approaches such as ML. Although such new approaches provide 
benefits over traditional software development, they also create challenges for 
assurance. One challenge is that analysis techniques used for verification may 
be immature. For example, while neural networks have been studied since the 
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1950’s, pragmatic approaches to their verification have been investigated only 
recently [53]. Another issue is that prerequisites for assurance may not be met 
by the development approach. For example, although they are expressive, neural 
networks suffer from uninterpretability [41] — that is, it is not feasible for a human 
to examine a trained network and understand what it is doing. This is a serious 
obstacle to assurance because formal and automated methods account for only 
part of the verification process, augmented by reviews. As a result, increasing 
the interpretability of ML models is an active area of current research. 

While all these challenges are significant, the benefit of addressing them is 
worth the effort. As our world moves towards increasing automation, we must 
develop approaches for assuring the dependability of the complex systems we 
build. Without this, we either stall progress or run the risk of endangering our- 
selves — neither alternative seems desirable. 
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Abstract. Correctness-by-Construction (CbC) is an approach to incre- 
mentally create formally correct programs guided by pre- and postcon- 
dition specifications. A program is created using refinement rules that 
guarantee the resulting implementation is correct with respect to the 
specification. Although CbC is supposed to lead to code with a low defect 
rate, it is not prevalent, especially because appropriate tool support is 
missing. To promote CbC, we provide tool support for CbC-based pro- 
gram development. We present CorC, a graphical and textual IDE to 
create programs in a simple while-language following the CbC approach. 
Starting with a specification, our open source tool supports CbC devel- 
opers in refining a program by a sequence of refinement steps and in 
verifying the correctness of these refinement steps using the theorem 
prover KeY. We evaluated the tool with a set of standard examples on 
CbC where we reveal errors in the provided specification. The evalua- 
tion shows that our tool reduces the verification time in comparison to 
post-hoc verification. 


1 Introduction 


Correctness-by-Construction (CbC) [12,13,19, 23] is a methodology to construct 
formally correct programs guided by a specification. CbC can improve program 
development because every part of the program is designed to meet the corre- 
sponding specification. With the CbC approach, source code is incrementally 
constructed with a low defect rate [19] mainly based on three reasons. First, 
introducing defects is hard because of the structured reasoning discipline that is 
enforced by the refinement rules. Second, if defects occur, they can be tracked 
through the refinement structure of specifications. Third, the trust in the pro- 
gram is increased because the program is developed following a formal pro- 
cess [14]. 

Despite these benefits, CbC is still not prevalent and not applied for large- 
scale program development. We argue that one reason for this is missing tool 


© The Author(s) 2019 
R. Hähnle and W. van der Aalst (Eds.): FASE 2019, LNCS 11424, pp. 25-42, 2019. 
https://doi.org/10.1007/978-3-030-16722-6_2 


26 T. Runge et al. 


support for a CbC-style development process. Another issue is that the pro- 
grammer mindset is often tailored to the prevalent post-hoc verification app- 
roach. CbC has been shown to be beneficial even in domains where post-hoc 
verification is required [29]. In post-hoc verification, a method is verified against 
pre- and postconditions. In the CbC approach, we refine the method stepwise, 
and we can check the method partially after each step since every statement 
is surrounded by a pair of pre- and postconditions. The verification of refine- 
ment steps and Hoare triples reduces the proof complexity since the proof task 
is split into smaller problems. The specifications and code developed using the 
CbC approach can be used to bootstrap the post-hoc verification process and 
allow for an easier post-hoc verification as the method constructed using CbC 
generally is of a structure that is more amenable to verification [29]. 

In this paper, we present CorC,! a tool designed to develop programs follow- 
ing the CbC approach. We deliberately built our tool on the well-known post-hoc 
verifier KeY [4] to profit from the KeY ecosystem and future extensions of the 
verifier. We also add CbC as another application area to KeY, which opens the 
possibility for KeY users to adopt the CbC approach. This could spread the 
constructive CbC approach to areas where post-hoc verification is prevalent. 

Our tool CorC offers a hybrid textual-graphical editor to develop programs 
using CbC. The textual editor resembles a normal programming editor, but 
is enriched with support for pre- and postcondition specifications. The graphi- 
cal editor visualizes the code, its specification, and the program refinements in 
a tree-like structure. The developers can switch back and forth between both 
views. In order to support the correct application of the refinement rules, the 
tool is integrated with KeY [4] such that proof obligations can be immediately 
discharged during program development. In a preliminary evaluation, we found 
benefits of CorC compared to paper-and-pencil-based application of CbC and 
compared to post-hoc verification. 


2 Foundations of Correctness-by-Construction 


Classically, CbC [19] starts with the specification of a program as a Hoare triple 
comprising a precondition, an abstract statement, and a postcondition. Such a 
triple, say T, should be read as a total correctness assertion: if T is in a state 
where the precondition holds and its abstract statement is executed, then the 
execution will terminate and the postcondition will hold. T will be true for a 
certain set of concrete program instantiations of the abstract program and false 
for other instantiations. A refinement of T is a triple, say T’, which is true for a 
subset of concrete programs that render T to be true. 

In our work, pre-/post-condition specifications for programs are written in 
first-order logic (FOL). A formula in FOL consists of atomic formulas which are 
logically connected. An atomic formula is a predicate which evaluates to true or 
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{P} S {Q} can be refined to 

1. Skip : {P} skip {Q} iff P implies Q 

2. Assignment : {P} z := E {Q} iff P implies Q[x := E] 

3. Composition : {P} S1; S2 {Q} if there is an intermediate condition M 
such that {P} S1 {M} and {M} S2 {Q} 

4. Selection : {P} if Gi > S, elseif... Gan > Sn fi {Q} iff (P implies 
Gi V G2 V... V Gn) and {P A Gi} Si {Q} holds for all i. 

5. Repetition : {P} do [I, V] G > S od {Q} iff (P implies I) and (I A =G implies 
Q) and {I A G} S {I} and {I AG A V=Vo} S {I A 0<V A V<Vo} 

6. Weaken pre : {P} S {Q} iff P implies P’ 

7. Strengthen post: {P} S {Q} iff Q' implies Q 

8. Subroutine : {P} Sub {Q} with subroutine {P'} Sub {Q} 


iff P is equal to P' and Q' is equal to Q 


Fig. 1. Refinement rules in CbC [19] 


false. Programs in this work are written in the CorC language, which is inspired 
by the Guarded Command Language (GCL) [11] and presented below. 

For the concrete instantiation of conditions and assignments, our tool uses a 
host language. We decided for Java, but other languages are also possible. 

To create programs using CbC, we use refinement rules. A Hoare triple is 
refined by applying rules, which introduce CorC language statements, so that 
a concrete program is created. The concrete program obtained by refinement 
is guaranteed to be correct by construction, provided that the correctness- 
preserving refinement steps have been accurately applied. In Fig. 1, we present 
the statements and refinement rules used in CbC and our tool. 


Skip. A skip or empty statement is a statement that does not alter the state of 
the program (i.e., it does nothing) [11,19]. This means a Hoare triple with a skip 
statement evaluates to true if the precondition implies the postcondition. 


Assignment. An assignment statement assigns an expression of type T to a vari- 
able, also of type T. In the tool, we use a Java-like assignment (x = y). To refine 
a Hoare triple {P} S {Q} with an assignment statement, the assignment rule is 
used. This rule replaces the abstract statement S by an assignment {P} x = E {Q} 
iff P implies Q[x := E]. 


Composition. A composition statement is a statement which splits one abstract 
statement into two. A Hoare triple {P} S {Q} is split to {P} S, {M} and {M} S2 {Q} 
in which S is refined to S1 and S2. M is an intermediate condition which evaluates 
to true after S1 and before $2 is executed [11]. 


Selection. Selection in our CorC language works as a switch statement. It refines 
a Hoare triple {P} S {Q} to {P} if Gi — S4 elseif... Ga — Sn fi {Q}. The guards 
G; are evaluated, and the sub-statement S; of the first satisfied guard is executed. 


28 T. Runge et al. 


We use a switch-like statement so that every sub-statement has an associated 
guard for further reasoning. The selection refinement rule can only be used if 
the precondition P implies the disjunction of all guards so that at least one 
sub-statement could be executed. 


Repetition. The repetition statement {P} do [I, V] G— Sod {Q} works like a 
while loop in other languages. If the loop guard G evaluates to true, the associ- 
ated loop statement S is executed. The repetition statement is specified with an 
invariant I and a variant V. To refine a Hoare triple {P} S {Q} with a repetition 
statement, (1) the precondition P has to imply the invariant I of the repetition 
statement, (2) the conjunction of invariant and the negation of the loop guard 
G have to imply the postcondition Q, and (3) the loop body has to preserve the 
invariant by showing that {I \ G} S {I} holds. To verify termination, we have to 
show that the variant V monotonically decreases in each loop iteration and has 
0 as a lower bound. 


Weaken precondition. The precondition of a Hoare triple can be weakened if 
necessary. The weaken precondition rule replaces the precondition P with a new 
one P’ only if P implies P’ [12]. 


Strengthen postcondition. To strengthen a postcondition, the strengthen post- 
condition rule can be used. A postcondition Q is replaced by a new one Q’ only 
if Q’ implies Q [12]. 


Subroutine. A subroutine can be used to split a program into smaller parts. We 
use a simple subroutine call where we prohibit side effects and parameters. A 
triple {P} S {Q} can be refined to a subroutine {P’} Sub {Q’}, if the precondition 
P’ of the subroutine is equal to the precondition P of the refined statement and the 
postcondition Q’ of the subroutine is equal to the postcondition Q of the refined 
statement. The subroutine can be constructed as a separate CbC program to 
verify that it satisfies the specification. The Hoare triple {P’} Sub {Q’} is the 
starting point to construct a program using CbC. 


3  Correctness-by-Construction by Example 


To introduce the programming style of CbC, we demonstrate the construction 
of a linear search algorithm using CbC [19]. The linear search problem is defined 
as follows: We have an integer array a of some length, and an integer variable 
x. We try to find an element in the array a which has the same value as the 
variable x, and we return the index i where the (last) element x was found, or 
—1 if the element is not in the array. 

To construct the algorithm, we start with concretizing the pre- and postcon- 
dition of the algorithm. Before the algorithm is executed, we know that we have 
an integer array. Therefore, we specify a#null ^ a.length>0 as precondition P. 
The postcondition forces that if the index i is greater than or equal to zero, the 
element is found on the returned index i (Q := (i>0 = > al[i]=x)). 
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{P} st {Q} 


@ | composition for st 


{P} sti {I} A {I} st2 {Q} 


@ | assignment for st1 


{P} i =a.length — 1 {I} 


@ repetition for st2 


{I} do [I, V] G > loopSt od {Q} 


@ | assignment for loopSt 
{IAG}i=i—1{I} 


Fig. 2. Refinement steps for the linear search algorithm 


Our algorithm traverses the array in reverse order and checks for each index 
whether the value is equal to x. In this case, the index is returned. To create 
this algorithm, we construct an invariant I for the loop: 


I := ~appears(a, x, i + 1,a.length) ^ i>—1 ^ i<a.length 


The invariant is used to split the array into two parts. A part from i+ 1 to 
a.length where x is not contained, and a part from zero to i which is not 
checked yet. In every iteration, the next index of the array is checked. The 
predicate appears(a,x,1,h) asserts that x occurs in array a inside the range 
from 1 (included) to h (excluded). The predicate can be translated to FOL as 
di: (421A i<hA aļi]=x). 

We can use the CbC refinement rules to implement linear search. The refine- 
ment steps for the example are shown in Fig.2 and numbered from @ to @. 
To create a loop in the program, we need to initialize a loop counter variable to 
establish the invariant. Therefore, we split the program by introducing a compo- 
sition statement (Q in Fig. 2). The invariant I is used as intermediate condition 
(i.e., M := I), because it has to be true after the initialization, and before the 
first loop step. The statement st1 is refined to an assignment statement @. We 
initialize i with a.length — 1 to start at the end of the array. This assignment 
satisfies the intermediate condition I where i is replaced by a.length — 1. The 
range of appears is empty, and therefore the predicate evaluates to true. To 
refine the second statement (st2), we use the repetition refinement rule @). As 
long as x is not found, we iterate through the array. As guard of the repeti- 
tion, we use (i>0 A ali]#x). The invariant of the repetition is the invariant I 
introduced above. The variant V is i+ 1. To verify that this refinement is valid, 
we have to verify that the precondition of the repetition statement implies the 
invariant, and that the invariant and the negated guard imply the postcondition 
of the repetition (cf. Rule 5). Both are valid because the precondition is equal 
to the invariant and the postcondition of the repetition statement (in this case 
it is Q) is equal to the negated guard. The last step is to refine the abstract loop 
statement (loopSt) @. We use an assignment to decrease i and get the final 
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program. We can verify that the invariant holds after each loop iteration. The 
program terminates because the variant decreases in every step and it is always 
greater than or equal to zero. 


4 Tool Support in CorC 


CorC extends KeY’s application area by enabling CbC to spread the constructive 
engineering to areas where post-hoc verification is prevalent. KeY programmers 
can use both approaches to construct formally correct programs. By using CorC, 
they develop specification and code that can bootstrap the post-hoc verification. 
The CorC tool? is realized as an Eclipse plug-in in Java. We use the Eclipse 
Modeling Framework (EMF)? to specify a CbC meta model. This meta model 
is used by two editor views, a textual and a graphical editor. The Hoare triple 
verification is implemented by the deductive program verification tool KeY [4]. 
In the following list, we summarize the features of CorC. 


— Programs are written as Hoare triple specifications, including pre-/postcondi- 
tion specifications and abstract statements or assignment /skip statements in 
concrete triples. 

— CorC has eight rules to construct programs: skip, assignment, composition, 
selection, repetition, weakening precondition, strengthening postcondition, 
and subroutine (cf. Sect. 2). 

— Pre-/postconditions and invariant specification are automatically propagated 
through the program. 

— CorC comprises a graphical and a textual editor that can be used 
interchangeably. 

— Up to now, CorC supports integers, chars, strings, arrays, and subroutine 
calls without side effects, I/O, and library calls. 

— Hoare triples are typically verified by KeY automatically. If the proof cannot 
be closed automatically, the user can interact with KeY. 

— Helper methods written in Java 1.5 can be used in a specification. 

— CorC comprises content assist and an automatic generation of intermediate 
conditions. 


4.1 Graphical Editor 


The graphical editor represents CbC-based program refinement by a tree struc- 
ture. A node represents the Hoare triple of a specific CorC language statement. 
Figure 3 presents the linear search algorithm of Sect.3 in the graphical editor. 
The structure of the tree is the same as in Fig. 2. The additional nodes on the 
right specify used program variables including their type and global invariant 


? https: //github.com/TUBS-ISF/CorC. 
3 https: //eclipse.org/emf/. 
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Fomula x f Variables 
- int[]a 
precondition | statement postcondition| int x 


{true} Statement inti 


Global Conditions 
a!=null 
a.length>=0 


Composition 
i>=-1 


i<a.length 


precondition postcondition 


{true} {i>=0 -> afi]=x} 


statement 1 intermediate condition statement 2 


{lappears(a, x, i+1, 


SecondStatement 
a.length)} 


FirstStatement 


x 
precondition | statement iti Repetition Statement DO...OD 


{lappears(a, x, invariant guard variant 
i+1, a.length)} 


{true} i=a.length-1; 


fappears(a, x, i+1, 


i>=0 & afi] != x i+1 
a.length) fil 


precondition | loop statement | postcondition 


{(lappears(a, x, i 
+1, alength)) & | LoopStatement 


(i>=0 & ali] != x)} 


{!appears(a, x, i 
+1, a.length)} 


precondition statement postcondition 


{(lappears(a, x, i 
+1, alength)) & i=i-1; 
(i>=0 & ali] != x)} 


{!appears(a, x, i 
+1, a.length)} 


Fig. 3. Linear search example in the graphical editor 


conditions. The global invariant conditions are added to every pre- and post- 
condition of Hoare triples to simplify the construction of the program. In the 
example, we specify the array a and the range of variable i to support the 
verification, as KeY requires this range to be explicit for verification. 

The root node of the tree shows the abstract Hoare triple for the overall 
program with a symbolic name for the abstract statement. In every node, the 
pre- and postcondition are specified on the left and right of the node under the 
corresponding header. A composition statement node, the second statement of 
the tree, contains the pre- and postcondition and additionally defines an inter- 
mediate condition. The intermediate condition is the middle term in the bottom 
line. Both abstract sub-statements of the composition have a symbolic name and 
can be further refined by adding a connection to another node (i.e., creating a 
parent-child relation). The repetition node contains fields to specify the invari- 
ant, the guard and the variant of the repetition. These fields are in the middle 
row. The pre- and postcondition are associated to the inner loop statement. An 
assignment node (cf. both leaf nodes of the figure) contains the precondition, 
the assignment, and the postcondition. The representations of the nodes for the 
refinements not illustrated in this example are similar. 
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Refinement steps are represented by edges. The pre- and postconditions are 
propagated from parents to their children on drawing the parent/child relation. 
We explicitly show the propagated conditions in a node to improve readability. 
The propagated conditions from the parent are unmodifiable because refinement 
rules determine explicitly how conditions are propagated. An exception are the 
rules to weaken the precondition or strengthen the postcondition. Here, the 
conditions can be overridden. At the repetition statement, we only depict the 
pre-/postconditions of the inner loop statement to reduce the size of this node. 
The pre-/postconditions of the parent node (in our example the composition 
statement) are not shown explicitly, but they are propagated internally to verify 
that the repetition refinement rule is satisfied. To visualize the verification status, 
the nodes have a green border if proven, a red one otherwise. 

By showing the Hoare triples explicitly, problems in the program can be local- 
ized. If some leaf node cannot be proven, the user has to check the assignment 
and the corresponding pre-/postcondition. If an error occurred, the conditions 
on the refinement path up to pre-/postcondition of the starting Hoare triple can 
be altered. Other paths do not need to be checked. To prove the program correct, 
we have to prove that the refinement is correct. Aside from the side conditions 
of refinement rules (cf. iff conditions in refinement rules), only the leaf nodes of 
the refinement tree which contain basic Hoare triples with skip or assignment 
statements need to be verified by a prover, while all composite statements are 
correct by construction of their conditions. 

To support the user in developing intermediate conditions for composition 
statements, our tool can compute the weakest precondition from a postcondition 
and a concrete assignment by using the KeY theorem prover. So, the user can 
create a specific assignment statement and generate the intermediate conditions 
afterwards. We also support modularization, to cover cases where algorithms 
become too large. Sub-algorithms can be created using CbC in other CorC pro- 
grams. We introduce a simple subroutine rule which can be used as a leaf node 
in the editor. The subroutine has a name and it is connected to a second diagram 
with the same name as the subroutine. This subroutine call is similar to a classic 
method call. It can be used to decompose larger CbC developments to multiple 
smaller programs. 


4.2 Textual Editor 


The textual editor is an editor for the CorC programming language described 
above. The user writes code by using keywords for the specific statements and 
enriches the code with conditions, such as invariants or intermediate conditions, 
and assignments in our CorC syntax. The syntax of the composed statements 
in the textual editor is shown in Fig. 4. In the GlobalConditions declaration, 
we enumerate the needed global conditions separated with a comma. The used 
variables are enumerated after the JavaVariables keyword. 

The linear search example program presented in Sect. 3 is shown in the syntax 
of CorC in Listing 1. The program starts with keyword Formula. The pre- and 
postcondition of the abstract Hoare triple are written after the pre: and post: 
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Selection statement Repetition statement 
if (“guard”) then {statement} while (“guard”) 
elseif (“guard”) then {statement} inv: [“invariant”] var: [“variant”| 


on do {statement} od 
fi 


Fig. 4. Syntax of statements in textual editor 


1 | Formula "linearSearch" 

2 | pre: {"true"} 

3 /f{ 

4 { 

5 i=a.length-1; 

6 } 

7 intm: ["!appears(a, x, itl, a.length)"] 
8 { 

9 while ("i>=0 & a[i]!=x") 

10 inv: ["!appears(a, x, i+1, a.length)"] 
11 var: ["i+ti"] dó 

12 { 

13 i=i-1; 

14 } od 

15 } 

16 |} 

17 |post: {"i>=0 -> afli]=x"} 

18 

19 | GlobalConditions 

20 conditions {"a!=null", "a.length>=0", 
21 "i>=-1", "“i<a.length"} 

22 

23 | JavaVariables 

24 variables {"int[] a", "int x", "int i"} 


Listing 1. Linear search example in the textual editor 


keywords. The abstract statement of the Hoare triple is refined to a composition 
statement in lines 3-16. The statements are surrounded by curly brackets to 
establish the refinement structure. We have the first statement in lines 4-6, the 
intermediate condition in line 7 and the second statement in lines 8-15. The 
first statement is refined to an assignment (Line 5). The refinement is done 
by introducing an assignment in Java syntax (i = a.length — 1;). The second 
statement is refined to a repetition statement (cf. the syntax of a repetition 
statement in Fig. 4). We specify the guard, the invariant, and the variant. Finally, 
the single statement of the loop body is refined to an assignment in Line 13. 
As in the graphical editor, pre-/postconditions are propagated top-down from 
a parent to a child statement. For example, the intermediate condition of a 
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1 |\javaSource "src"; 

2 |\include "helper.key"; 

3 |\programVariables {int x;} 

4 |\problem { 

5 (x = 0) -> \<{x=x+1;}\> (x = 1) 
6 |} 


Listing 2. KeY problem file 


composition statement which is the postcondition of the first sub-statement and 
the precondition of the second, appears only once in the editor (e.g., Line 7). To 
support the user, we implemented syntax highlighting and a content assist. When 
starting to write a statement, a user may employ auto-completion where the 
statements are inserted following the syntax in Fig.4. The user can specify the 
conditions, then the next statement can be refined. The editor also automatically 
checks the syntax and highlights syntax errors. Information markers are used to 
indicate statements which are not proven yet. For example, the Hoare triple of 
the assignment statement (i = a.length — 1) in Listing 1 has to be verified, and 
CorC marks the statement according to the proof completion results. 


4.3 Verification of CorC Programs 


To prove the refined program is correct, we have to prove side conditions of refine- 
ments correct (e.g., prove that an assignment satiesfies the pre-/postcondition 
specification). This reduces the proof complexity because the challenge to prove 
a complete program is decomposed into smaller verification tasks. The interme- 
diate Hoare triples are verified indirectly through the soundness of the refine- 
ment rules and the propagation of the specifications from parent nodes to child 
nodes [19]. Side conditions occur in all refinements (cf. iff conditions in refinement 
rules). These side conditions, such as the termination of repetition statements 
or that at least one guard in a selection has to evaluate to true, are proven in 
separate KeY files. 

For the proof of concrete Hoare triples, we use the deductive program verifier 
Key [4]. Hoare triples are transformed to KeY’s dynamic logic syntax. The syn- 
tax of KeY problem files is shown in Listing 2. Using the keyword javaSource, 
we specify the path to Java helper methods which are called in the specifi- 
cations. These methods have to be verified independently with KeY. A KeY 
helper file, where the users can define their own FOL predicates for the specifi- 
cation, is included with the keyword include. For example, in CorC a predicate 
appear s(a,x,l,h) (cf. the linear search example) can be used which is specified 
in the helper file as a FOL formula. The variables used in the program are listed 
after the keyword programVariables. After problem, we define the Hoare triple 
to be proven, which is translated to dynamic logic as used by KeY. KeY problem 
files are verified by KeY. As we are only verifying simple Hoare triples with skip 
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or assignment statements, KeY is usually able to close the proofs automatically 
if the Hoare triple is valid. 

To verify total correctness of the program, we have to prove that all repe- 
tition statements terminate. The termination of repetition statements is shown 
by proving that the variants in the program monotonically decrease and are 
bounded. Without loss of generality, we assume this bound to equal 0, as this 
is what KeY requires. This is done by specifying the problem in the KeY 
file in the following way: (invariant & guard) -> {var0:=var} \<{std}\> 
(invariant & var<var0 & var>=0). The code of the loop body is specified at 
std to verify that after one iteration of the loop body the variant var is smaller 
than before but greater than or equal to zero. 

To verify Hoare triples in the graphical editor, we implemented a menu entry. 
The user can right-click on a statement and start the automatic proof. If the 
proof is not closed, the user can interact with the opened KeY interface. To 
prove Hoare triples in the textual editor, we automatically generate all needed 
problem files for KeY whenever the user saves the editor file. The proof of the 
files is started using a menu button. The user gets feedback which triples are 
not proven by means of markers in the editor. 


4.4 Implementation as Eclipse Plugin 


We extended the Eclipse modeling framework with plugins to implement the two 
editors. We have created a meta model of the CbC language to represent the 
required constructs (i.e., statements with specification). The statements can be 
nested to create the CbC refinement hierarchy. The graphical and the textual 
editor are projections on the same meta model. The graphical editor is imple- 
mented using the framework Graphiti.* It provides functionality to create nodes 
and to associate them to domain elements, such as statements and specifications. 
The nodes can be added from a palette at the side of the editor, so no incor- 
rect statement with its associated specification can be created. We implemented 
editing functionality to change the text in the node; the background model is 
changed simultaneously. Graphiti also provides the possibility to update nodes 
(e.g., to propagate pre- and postconditions), if we connect those nodes by refine- 
ment edges. The refinement is checked for compliance with the CbC rules. 

The textual editor is implemented using XText.° We created a grammar 
covering every statement and the associated specification. If the user writes a 
program, the text is parsed and translated to an instance of the meta model. If a 
program is created in one editor, a model (an instance of our meta model) of the 
program is created in the background. We can easily transform one view into the 
other. The transformation is a generation step and not a live synchronization 
between both views, but it is carried out invisibly for the user when changing 
the views. 


4 https: //eclipse.org/graphiti/. 
5 https://eclipse.org/Xtext/. 
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Table 1. Evaluation of the example programs 


Algo- ##Nodes #Lines #Lines #Verified CbC CbC Phv Phv 
rithm in GE in TE with CorC Total Total Total Total 
JML triples Proof- Proof- Proof- Proof- 

Nodes Time Nodes Time 


Linear 5 12 10 5/5 285 0.45s 589 1.2 s 
Search 

Max. 9 21 15 9/9 1023 1:25 993 1.8 s 
Element 

Pattern 14 23 20 13/13 21131 54.9s 201619 1479.3 s 
Matching 

Exponen- 7 21 17 7/7 6588 15.2s 7303 204s 
tiation 

Log. 5 16 12 5/5 13756 42.7s 18835 68.58 
Approx. 

Dutch 8 26 24 8/8 4107 5.78 4993 13.4 s 
Flag 

Factorial 5 15 13 4/4 1554 3.6 s 1598 44s 


(GE) Grahical Editor, (TE) Textual Editor, (PhV) Post-hoc Verification 


In implementing CorC, we considered the exchangeability of the host lan- 
guage. The specifications and assignments are saved as strings in the meta 
model. They are checked by a parser to comply with Java. This parser could 
be exchanged to support a different language. The verification is done by gener- 
ating KeY files which are then evaluated by KeY. Here, we have to exchange the 
generation of the files if another theorem prover should be integrated. The infor- 
mation of the meta model may have to be adopted to fit the needs of the other 
prover. We also have to implement a programmatic call to the other prover. 


5 Evaluation 


The tool support offers new chances to evaluate CbC versus post-hoc verification. 
We quantitatively compare the development and verification of programs with 
CorC and with post-hoc verification. This is to check the hypothesis that the 
verification of algorithms is faster with CorC than with post-hoc verification. We 
created the first eight algorithms from the book by Kourie and Watson [19] in our 
graphical editor. For comparison purposes, we also wrote each example as a plain 
Java program with JML specifications in order to directly verify it with KeY. 
The specifications are the same as in CorC. We measured the verification time 
and the proof nodes that KeY needed to close the proofs for both approaches. 
The results of the evaluation are presented in Table 1 (verification time rounded). 
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Fig. 5. Proof time of CbC and post-hoc verification in logarithmic scale 


The algorithms have 5 to 14 nodes in the graphical editor and 12 to 26 lines 
of code in the textual editor. The Java version with a JML specification always 
has fewer lines (between 8% and 29% smaller). The additional specifications, 
such as the intermediate conditions of composition statements, and the global 
invariant conditions and variables cause more lines of code in the CbC program. 

The verification of the eight algorithms worked nearly without problems. 
We verified 7 out of 8 examples within CorC. In the cases without problems, 
every Hoare triple and the termination of the loops could be proven. We had to 
prove fewer Hoare triples than nodes in the editor, as not every node has to be 
proven separately. Composition nodes are proven indirectly through the refine- 
ment structure. For exponentiation, logarithm, and factorial, we had to imple- 
ment recursive helper methods which are used in the specification. Therefore, 
the programs impose upper bounds for integers to shorten the proof. The binary 
search algorithm could not be verified automatically in KeY using post-hoc ver- 
ification or CorC. In each step, when the element is not found, the algorithm 
halves the array. KeY could not prove that the searched element is in the new 
boundaries because verification problems with arithmetic division are hard to 
prove for KeY automatically. 

In the case of measured proof nodes, maximum element needs slightly fewer 
nodes proved with post-hoc verification than with CbC. In the other cases, the 
proofs for the algorithms constructed with CbC are 3% to 854% smaller. The 
largest difference was measured for the pattern matching algorithm. The proof 
is reduced to a ninth of the nodes. 

The verification time is visualized in Fig. 5. The time is measured in millisec- 
onds and scaled logarithmically. The proofs for the CbC approach are always 
faster showing lower proof complexity. For maximum element, exponentiation, 
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logarithm and factorial, the post-hoc verification time requires between 22% 
and 60% more time. The difference increases for Dutch flag and linear search to 
137% and 176%, respectively. Algorithm pattern matching has the biggest differ- 
ence. Here, the CbC approach needs nearly a minute, but the post-hoc approach 
needs over 24 min. To verify our hypothesis, we apply the non-parametric paired 
Wilcoxon-Test [30] with a significance level of 5%. We can reject the null hypoth- 
esis that CbC verification and post-hoc verification have no significant difference 
in verification time (p-value = 0.007813). This rejection of the null hypothesis 
in an empirical evidence for our hypothesis that verification is faster with CorC 
than with post-hoc verification. 

With our tool support, we were able to compare the CbC approach with post- 
hoc verification. For our examples, we evaluated that the verification effort is 
reduced significantly which indicates a reduced proof complexity. It is worthwhile 
to further investigate the CbC approach, also to profit from synergistic effects 
in combination with post-hoc verification. As we built CorC on top of KeY, the 
post-hoc verification of programs constructed with CorC is feasible. 

An advantage of CorC is the overview on all Hoare triples during develop- 
ment. In this way, we found some specifications where descriptions in the book 
by Kourie and Watson [19] were not precise enough to verify the problem in 
KeY. For example, in the pattern matching algorithm, we had to verify two 
nested loops. At one point, we had to verify that the invariant of the inner loop 
implies the invariant of the outer loop. This was not possible, so we extended the 
invariant of the inner loop to be the conjunction of both invariants. In the book 
of Kourie and Watson [19], this conjunction of both invariants was not explicitly 
used. 


6 Related Work 


We compare CorC to other programming languages and tools using specification 
or refinements. The programming language Eiffel is an object-oriented program- 
ming language with a focus on design-by-contract [21,22]. Classes and methods 
are annotated with pre-/postconditions and invariants. Programs written in Eif- 
fel can be verified using AutoProof [18,28]. The verification tool translates the 
program with assertions to a logic formula. An SMT-solver proves the correct- 
ness and returns the result. Spec# is a similar tool for specifying C# programs 
with pre-/postcondition contracts. These programs can be verified using Boogie. 
The code and specification is translated to an intermediate language (BoogiePL) 
and verified [5,6]. VCC [8] is a tool to annotate and verify C code. For this pur- 
pose, it reuses the Spec# tool chain. VeriFast [16] is another tool to verify C 
and Java programs with the help of contracts. The contracts are written in sep- 
aration logic (a variant of Hoare logic). As in Eiffel, the focus of Spec#, VCC, 
and VeriFast is on post-hoc verification and debugging failed proof attempts. 
The Event-B framework [2] is a related CbC approach. Automata-based 
systems including a specification are refined to a concrete implementation. 
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Atelier B [1] implements the B method by providing an automatic and inter- 
active prover. Rodin [3] is another tool implementing the Event-B method. The 
main difference to CorC is that CorC works on code and specifications rather 
than on automata-based systems. 

ArcAngel [25] is a tool supporting Morgan’s refinement calculus. Rules are 
applied to an initial specification to produce a correct implementation. The tool 
implements a tactic language for refinements to apply a sequence of rules. In 
comparison to our tool, ArcAngel does not offer a graphical editor to visualize 
the refinement steps. Another difference is that ArcAngel creates a list of proof 
obligations which have to be proven separately. CRefine [26] is a related tool for 
the Circus refinement calculus, a calculus for state-rich reactive systems. Like 
our tool, CRefine provides a GUI for the refinement process. The difference is 
that we specify and implement source code, but they use a state-based language. 
ArcAngelC [10] is an extension to CRefine which adds refinement tactics. 

The tools iContract [20] and OpenJML [9] apply design-by-contract. They 
use a special comment tag to insert conditions into Java code. These conditions 
are translated to assertions and checked at runtime which is a difference to our 
tool because no formal verification is done. DBC-Python is a similar approach 
for the Python language which also checks assertions at runtime [27]. 

To verify the CbC program, we need a theorem prover for Hoare triples, 
such as KeY [4]. There are other theorem provers which could be used (e.g., 
Coq [7] or Isabelle/HOL [24]). The Tecton Proof System [17] is a related tool 
to structure and interactively prove Hoare logic specification. The proofs are 
represented graphically as a set of linked trees. These interactive provers do not 
fit our needs because we want to automate the verification process. KeY provides 
a symbolic execution debugger (SED) that represents all execution paths with 
specifications of the code to the verification [15]. This visualization is similar to 
our tree representation of the graphical editor. The SED can be used to debug 
a program if an error occur during the post-hoc verification process. 


7 Conclusion and Future Work 


We implemented CorC to support the Correctness-by-Construction process of 
program development. We created a textual and a graphical editor that can be 
used interchangeably to enable different styles of CbC-based program develop- 
ment. The program and its specification are written in one of the editors and 
can be verified using KeY. This reduces the proof complexity with respect to 
post-hoc verification. We extended the KeY ecosystem with CorC. CorC opens 
the possibility to utilize CbC in areas where post-hoc verification is used as pro- 
grammers could benefit from synergistic effects of both approaches. With tool 
support, CbC can be studied in experiments to determine the value of using 
CbC in industry. 
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For future work, we want to extend the tool support, and we want to evaluate 
empirically the benefits and drawbacks of CorC. To extend the expressiveness, 
we implement a rule for methods to use method calls in CorC. These methods 
have to be verified independently by CorC/KeY. We could investigate whether 
the method call rules of KeY can be used for our CbC approach. Another future 
work is the inference of conditions to reduce the manual effort. Postconditions 
can be generated automatically for known statements by using the strongest 
postcondition calculus. Invariants could be generated by incorporating external 
tools. As mentioned earlier, other host languages and other theorem provers can 
be integrated in our IDE. 

The second work package for future work comprise the evaluation with a 
user study. We could compare the effort of creating and verifying algorithms 
with post-hoc verification and with our tool support. The feedback can be used 
to improve the usability of the tool. 
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Abstract. Static program analysis often encounters problems in analyz- 
ing library code. Most real-world programs use library functions inten- 
sively, and library functions are usually written in different languages. 
For example, static analysis of JavaScript programs requires analysis of 
the standard built-in library implemented in host environments. A com- 
mon approach to analyze such opaque code is for analysis developers to 
build models that provide the semantics of the code. Models can be built 
either manually, which is time consuming and error prone, or automati- 
cally, which may limit application to different languages or analyzers. In 
this paper, we present a novel mechanism to support automatic modeling 
of opaque code, which is applicable to various languages and analyzers. 
For a given static analysis, our approach automatically computes anal- 
ysis results of opaque code via dynamic testing during static analysis. 
By using testing techniques, the mechanism does not guarantee sound 
over-approximation of program behaviors in general. However, it is fully 
automatic, is scalable in terms of the size of opaque code, and provides 
more precise results than conventional over-approximation approaches. 
Our evaluation shows that although not all functionalities in opaque code 
can (or should) be modeled automatically using our technique, a large 
number of JavaScript built-in functions are approximated soundly yet 
more precisely than existing manual models. 


Keywords: Automatic modeling - Static analysis - Opaque code - 
JavaScript 


1 Introduction 


Static analysis is widely used to optimize programs and to find bugs in them, 
but it often faces difficulties in analyzing library code. Since most real-world pro- 
grams use various libraries usually written in different programming languages, 
analysis developers should provide analysis results for libraries as well. For exam- 
ple, static analysis of JavaScript apps involves analysis of the builtin functions 
implemented in host environments like the V8 runtime system written in C++. 
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A conventional approach to analyze such opaque code is for analysis devel- 
opers to create models that provide the analysis results of the opaque code. 
Models approximate the behaviors of opaque code, they are often tightly inte- 
grated with specific static analyzers to support precise abstract semantics that 
are compatible with the analyzers’ internals. 

Developers can create models either manually or automatically. Manual mod- 
eling is complex, time consuming, and error prone because developers need 
to consider all the possible behaviors of the code they model. In the case of 
JavaScript, the number of APIs to be modeled is large and ever-growing as 
the language evolves. Thus, various approaches have been proposed to model 
opaque code automatically. They create models either from specifications of the 
code’s behaviors [2,26] or using dynamic information during execution of the 
code [8,9,22]. The former approach heavily depends on the quality and format 
of available specifications, and the latter approach is limited to the capability of 
instrumentation or specific analyzers. 

In this paper, we propose a novel mechanism to model the behaviors of 
opaque code to be used by static analysis. While existing approaches aim to cre- 
ate general models for the opaque code’s behaviors, which can produce analysis 
results for all possible inputs, our approach computes specific results of opaque 
code during static analysis. This on-demand modeling is specific to the abstract 
states of a program being analyzed, and it consists of three steps: sampling, 
run, and abstraction. When static analysis encounters opaque code with some 
abstract state, our approach generates samples that are a subset of all possible 
inputs of the opaque code by concretizing the abstract state. After evaluating the 
code using the concretized values, it abstracts the results and uses it during anal- 
ysis. Since the sampling generally covers only a small subset of infinitely many 
possible inputs to opaque code, our approach does not guarantee the soundness 
of the modeling results just like other automatic modeling techniques. 

The sampling strategy should select well-distributed samples to explore the 
opaque code’s behaviors as much as possible and to avoid redundant ones. Gen- 
erating too few samples may miss too much behaviors, while redundant samples 
can cause the performance overhead. As a simple yet effective way to control the 
number of samples, we propose to use combinatorial testing [11]. 

We implemented the proposed automatic modeling as an extension of SAFE, 
a JavaScript static analyzer [13,17]. For opaque code encountered during anal- 
ysis, the extension generates concrete inputs from abstract states, and executes 
the code dynamically using the concrete inputs via a JavaScript engine (Node.js 
in our implementation). Then, it abstracts the execution results using the oper- 
ations provided by SAFE such as lattice-join and our over-approximation, and 
resumes the analysis. 

Our paper makes the following contributions: 


— We present a novel way to handle opaque code during static analysis by 
computing a precise on-demand model of the code using (1) input samples 
that represent analysis states, (2) dynamic execution, and (3) abstraction. 
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— We propose a combinatorial sampling strategy to efficiently generate well- 
distributed input samples. 

— We evaluate our tool against hand-written models for large parts of 
JavaScript’s builtin functions in terms of precision, soundness, and 
performance. 

— Our tool revealed implementation errors in existing hand-written models, 
demonstrating that it can be used for automatic testing of static analyzers. 


In the remainder of this paper, we present our Sample-Run-Abstract app- 
roach to model opaque code for static analysis (Sect. 2) and describe the sampling 
strategy (Sect.3) we use. We then discuss our implementation and experiences 
of applying it to JavaScript analysis (Sect. 4), evaluate the implementation using 
ECMAScript 5.1 builtin functions as benchmarks (Sect. 5), discuss related work 
(Sect. 6), and conclude (Sect. 7). 


2 Modeling via Sample-Run-Abstract 


Our approach models opaque code by designing a universal model, which is able 
to handle arbitrary opaque code. Rather than generating a specific model for 
each opaque code statically, it produces a single general model, which produces 
results for given states using concrete semantics via dynamic execution. We call 
this universal model the SRA model. 

In order to create the SRA model for a given static analyzer A and a dynamic 
executor €, we assume the following: 


— The static analyzer A is based on abstract interpretation [6]. It provides the 
abstraction function a : o(S) —> § and the concretization function y : § 3 
(9) for a set of concrete states S and a set of abstract states S. 

— An abstract domain forms a complete lattice, which has a partial order among 
its values from L(bottom) to T (top). 

— For a given program point c € C, either A or E can identify the code corre- 
sponding to the point. 


Then, the SRA model consists of the following three steps: 


— Sample : S > (S) 
For a given abstract state $ € S , Sample chooses a finite set of elements from 
q(8), a possible set of values for S. Because it is, in the general case, impossible 
to execute opaque code dynamically with all possible inputs, Sample should 
select representative elements efficiently as we discuss in the next section. 

— Run: CxS—S 
For a given program point and a concrete state at this point, Run generates 
executable code corresponding to the point and state, executes the code, and 
returns the result state of the execution. 

— Abstract : 9(S) > S 
For a given set of concrete states, Abstract produces an abstract state that 
encompasses the concrete states. One can apply a to each concrete state, join 
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Int 
mana A E 


Fig. 1. An abstract domain for even and odd integers 


all the resulting abstract states, and optionally apply an over-approximation 
heuristic, comparable to widening Broaden : S — S to mitigate missing 
behaviors of the opaque code due to the under-approximate sampling. 


We write the SRA model as sra: C x S — 9 and define it as follows: 


sra (¢,8) = Abstract({Run(c,s) | s € Sample(s)}) 
= Broaden(| Ha({Run(c,s)}) | s € Sample(s)}) 


We now describe how Į sra works using an example abstract domain for 
even and odd integers as shown in Fig.1. Let us consider the code snippet 
x := abs(x) at a program point c where the library function abs is opaque. 
We use maps from variables to their concrete values for concrete states, maps 
from variables to their abstract values for abstract states, and the identity func- 
tion for Broaden in this example. 


Case 51 = [x: n] where n is a constant integer: 


sra (c, 51) = LKal({Run(c,s)}) | s E€ Sample(S1)} 
E )D | se{k:n]}h} 


=| Kal Run(e, [x : n])})} 
= aa : |rl]}) 


Because the given abstract state $1, contains a single abstract value corresponding 
to a single concrete value, Sample produces the set of all possible states, which 
makes J} sra provide a sound and also the most precise result. 


Case 82 = [x : Even]: 


Vora (¢,82) = Lhal Run(c,s)}) | s € Sample(s2)} 
=| HalH{Run(c, s)}) | s € {fx : —2], [x : 0}, [x : 2]}} 
= Lito({[x : 0}, [x : 2]})} 


= [x : Even] 


When Sample selects three elements from the set of all possible states repre- 
sented by S2, executing abs results in {[x : 0], [x : 2]}. Since joining these two 
abstract states produces Even, Į sra models the correct behavior of abs by tak- 
ing advantage of the abstract domain. 
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Case 83 = |x: Int] : 


sra (c, 83) 

= [|{a({Run(c, s)} 
= [|{a({Run(c, s)} 
= [|{a({Run(c, s)} 
= Lta({[x : 0], [x : 


= |x : Int] 


) | s € Sample(s3)} 

) | s € Sample(S2) U Sample([x : Odd])} 

) | s € {fx : 2], [x : —1], [x : 0], [x : 1], [k : 2], [k : 3)}} 
1}, [x : 2], [x : 3]})} 


) 
82) 


| 


When an abstract value has a finite number of elements that are immediately 
below it in the abstract domain lattice, our sampling strategy selects samples 
from them recursively. Thus, in this example, Sample([x : Int]) becomes the 
union of Sample([x : Even]) and Sample([x : Oddj). We explain this recursive 
sampling strategy in Sect. 3. 


Case 84 = [x : Odd]: 


Isra (c, $4) =L|{a({Run(c,s)}) | s E€ Sample(s1)} 
=Li{e({Run(c,s)}) | se {[x: —1], [k : 1]}} 
= ea :1]}) 


While Į sra produces sound and precise results for the above three cases, it 
does not guarantee soundness; it may miss some behaviors of opaque code due 
to the limitations of the sampling strategy. Let us assume that Sample(|x : Odd]) 
selects {[x : —1], [x : 1]} this time. Then, the model produces an unsound result 
[x : 1], which does not cover odd integers, because the selected values explore 
only partial behaviors of abs. When the number of possible states at a call site of 
opaque code is infinite, the sampling strategy can lead to unsound results. A well- 
designed sampling strategy is crucial for our modeling approach; it affects the 
analysis performance and soundness significantly. The approach is precise thanks 
to under-approximated results from sampling, but entails a tradeoff between the 
analysis performance and soundness depending on the number of samples. In 
the next section, we propose a strategy to generate samples for various abstract 
domains and to control sample sizes effectively. 


3 Combinatorial Sampling Strategy 


We propose to use a combinatorial sampling strategy (inspired by combinatorial 
testing) by the types of values that an abstract domain represents. The domains 
represent either primitive values like number and string, or object values like 
tuple, set, and map. Based on combinatorial testing, our strategy is recursively 
defined on the hierarchy of abstract domains used to represent program states. 
Assume that @,b € A are abstract values that we want to concretize using 
Sample. 
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Number 


Fig. 2. The SAFE number domain for JavaScript 


3.1 Abstract Domains for Primitive Values 


To explain our sampling strategy for primitive abstract domains, we use the 
DefaultNumber domain from SAFE as an example. DefaultNumber represents 
JavaScript numbers with subcategories as shown in Fig. 2. The subcategories are 
NaN (not a number), +Inf (positive/negative infinity), UInt (unsigned integer), 
and NUInt (not an unsigned integer, which is a negative integer or a floating 
point number). 


Case |y(@)| = constant: 
Sample(a@) = y(@) 


When @ represents a finite number of concrete values, Sample simply takes all the 
values. For example, +Inf has two possible values, +Inf and -Inf. Therefore, 
Sample(tInf) = {+Inf, -Inf}. 


Case |y(@)| = œ and {b € A | VEC @. b Z 2} = constant: 


Sample(@) = Us Sample(b) 

When @ represents an infinite number of concrete values, but it covers (that is, 
is immediately preceded by) a finite number of abstract values in the lattice, 
Sample applies to each predecessor recursively and merges the concrete results 
by set union. Note that, “y covers x” holds whenever x C y and there is no 
z such that x C z C y. The number of samples increases linearly in this step. 
Number falls into this case. It represents infinitely many numbers, but it covers 
four abstract values in the lattice: NaN, +Inf, UInt, and NUInt. 


Case |7(@)| = œ and {b € A | VEC â. È £ F} = o0: 
Sample(@) = H(y(@)) 


When @ represents infinitely many concrete values and also covers infinitely many 
abstract values, we make the number of samples finite by applying a heuristic 
injection H of seed samples. For seed samples, we propose the following guidelines 
to manually select them: 


— Use asmall number of commonly used values. Our conjecture is that common 
values will trigger the same behavior in opaque code repeatedly. 

— Choose values that have special properties for known operators. For exam- 
ple, for each operator, select the minimum, maximum, identity, and inverse 
elements, if any. 
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In the DefaultNumber domain example, UInt and NUInt fall into this case. For 
the evaluation of our modeling approach in Sect.5, we selected seed samples 
based on the guidelines as follows: 


Sample(UInt) = {0,1,3,10,9999} 

Sample(NUInt) = {—10, —3, —1, —0.5, —0,0.5,3.14} 
We experimentally show that this simple heuristic works well for automatic 
modeling of JavaScript builtin functions. 


3.2 Abstract Domains for Object Values 


Our sampling strategy for object abstract domains consists of four steps. To 
sample from a given abstract object @ € A, we assume the following: 


— A concrete object a € y(@) is a map from fields to their values: Map| F, V]. 

— Abstract domains for fields and values are F and V, respectively. 

— The abstract domain A provides two helper functions: mustF : A > p(F) and 
mayF : A — F. The mustF(@) function returns a set of fields that Va € >(@) 
must have, and mayF(@) returns an abstract value fe F representing a set 
of fields that Ja € y(@) may have. 


Then, the sampling strategy follows the next four steps: 


1. Sampling fields 

In order to construct sampled objects, it first samples a finite number of fields. 
JavaScript provides open objects, where fields can be added and removed 
dynamically, and fields can be referenced not only by string literals but also 
by arbitrary expressions of string values. Thus, this step collects fields from a 
finite set of fields that all possible objects should contain (Fmust) and samples 
from a possibly infinite set of fields that some possible objects may (but not 
must) contain (Fmay): 


F must = mustF (a) 
Fray = Sample(mayF (@)) \ Fmust 


2. Abstracting values for the sampled fields 
For the fields in Fmust and Fmay sampled from the given abstract object a, it 
constructs two maps from fields to their abstract values, Mmust and Mmay, 
respectively, of type Map[F, Vi: 


Mmust = Af = F must: a({a(f) | ae (a) }) 
Mmay = AF € Fmay - alal f) | a € 7(@)}) 


3. Sampling values 
From Mmust and Mmay, it constructs another map Ms : F — (V4), where 
Va = V U {$} denotes a set of values and the absence of a field 4, by applying 
Sample to the value of each field in Fmust and Fmay. The value of each field 
in Fay contains Å to denote that the field may not exist in Ms: 
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Sample(Mimnuse(f)) if f € Fmust 
Sample(Mmay(f)) U {A} if f € Fmay 


4. Choosing samples by combinatorial testing 
Finally, since a number of all combinations from Ms, |] f€Domain(M,) M,(f)\, 
grows exponentially, the last step limits the number selections. We solve this 
selection problem by reducing it to a traditional testing problem with combi- 
natorial testing [3]. Combinatorial testing is a well-studied problem and effi- 
cient algorithms for generating test cases exist. It addresses a similar problem 
to ours, increasing dynamic coverage of code under test, but in the context 
of finding bugs: 
“The most common bugs in a program are generally triggered by 
either a single input parameter or an interaction between pairs of 
parameters.” 
Thus, we apply each-used or pair-wise testing (1 or 2-wise) as the last step. 


Ma = AF € Fit UF ar 


Now, we demonstrate each step using an abstract array object @, whose length 
is greater than or equal to 2 and the elements of which are true or false. We 
write T, to denote an abstract value such that (T) = {true, false}. 


— Assumptions 

e A concrete array object a is a map from indices to boolean values: 
Map[UInt, Boolean]. 

e For given abstract object a@, mustF (@) = {0,1} and mayF (a) = UInt. 

e From Sect. 3.1, we sample {0, 1,3, 10,9999} for UInt. 

e k-wise(M) generates a set of minimum number of test cases satisfying 
all the requirements of k-wise testing for a map M. It constructs a test 
case by choosing one element from a set on each field. 

— Step 1: Sampling fields 


Frnust = {0, 1} 
Fmay = Sample(UInt) \ {0,1} = {3, 10, 9999} 


— Step 2: Abstracting values for the sampled fields 


M must = [o m To, lie Tol 
Mmay = [3 — Ty, 10 — Ty, 9999 — Te] 


— Step 3: Sampling values 


M,=[ 0+ {true,false}, 1+ {true, false}, 
3+ {true, false, Å}, 10+ {true, false, $ 
9999 + {true, false, A} ] 


— 


— Step 4: Choosing samples by combinatorial testing 
The number of all combinations [] pe pomain(m,) |Ms(f)| is 108 even after sam- 
pling fields and values in an under-approximate manner. We can avoid such 
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explosion of samples and manage well-distributed samples by using combi- 
natorial testing. With each-used testing, three combinations can cover every 
element in a set on each field at least once: 


l-wise( Ms) = 
{ [0 = true, 1+ false, 3 > true, 10+ Ë, 9999 ++ $], 
[0 — false, 1 — true, 3+ false, 10 — false, 9999 +> true], 
[01+ false, 1 > true, 3 Ë, 10 — true, 9999 > false] } 


With pair-wise testing, 12 samples can cover every pair of elements from 
different sets at least once. 


4 Implementation 


We implemented our automatic modeling approach for JavaScript because of its 
large number of builtin APIs and complex libraries, which are all opaque code 
for static analysis. They include the functions in the ECMAScript language stan- 
dard [1] and web standards such as DOM and browser APIs. We implemented 
the modeling as an extension of SAFE [13,17], a JavaScript static analyzer. 
When the analyzer encounters calls of opaque code during analysis, it uses the 
SRA model of the code. 


Sample. We applied the combinatorial sampling strategy for the SAFE abstract 
domains. Of the abstract domains for primitive JavaScript values, UInt, NUInt, 
and OtherStr represent an infinite number of concrete values (c.f. third case in 
Sect. 3.1) and thus require the use of heuristics. We describe the details of our 
heuristics and sample sets in Sect. 5.1. 

We implemented the Sample step to use “each-used sample generation” for 
object abstract domains by default. In order to generate more samples, we added 
three options to apply pair-wise generation: 


— ThisPair generates pairs between the values of this and heap, 
— HeapPair among objects in the heap, and 
— ArgPair among property values in an arguments object. 


As an exception, we use the all-combination strategy for the DefaultDataProp 
domain representing a JavaScript property, consisting of a value and three 
booleans: writable, enumerable, and configurable. Note that field is used 
for language-independent objects and property is for JavaScript objects. The 
number of their combinations is limited to 23. We consider a linear increase of 
samples as acceptable. The Sample step returns a finite set of concrete states, 
and each element in the set, which in turn contains concrete values only, is passed 
to the Run step. 
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Run. For each concrete input state, the Run step obtains a result state by 
executing the corresponding opaque code in four steps: 


1. Generation of executable code 
First, Run populates object values from the concrete state. We currently omit 
the JavaScript scope-chain information, because the library functions that we 
analyze as opaque code are independent from the scope of user code. It derives 
executable code to invoke the opaque code and adds argument values from 
the static analysis context. 

2. Execution of the code using a JavaScript engine 
Run executes the generated code using the JavaScript eval function on 
Node.js. Populating objects and their properties from sample values before 
invoking the opaque function may throws an exception. In such cases, Run 
executes the code once again with a different sample value. If the second sam- 
ple value also throws an exception during population of the objects and their 
properties, it dismisses the code. 

3. Serialization of the result state 
After execution, the result state contains the objects from the input state, the 
return value of the opaque code, and all the values that it might refer to. Also, 
any mutation of objects of the input state as well as newly created objects 
are captured in this way. We use a snapshot module of SAFE to serialize the 
result state into a JSON-like format. 

4. Transfer of the state to the analyzer 
The serialized snapshot is then passed to SAFE, where it is parsed, loaded, 
and combined with other results as a set of concrete result states. 


Abstract. To abstract result states, we mostly used existing operations in SAFE, 
like lattice-join, and also implemented an over-approximation heuristic function, 
Broaden, comparable to widening. We use Broaden for property name sets in 
JavaScript objects, because mayF of a JavaScript abstract object can produce 
an abstract value that denotes an infinite set of concrete strings, and because 
\bsra cannot produce such an abstract value from simple sampling and join. 
Thus, we regard all possibly absent properties as sampled properties. Then, we 
implemented the Broaden function merging all possibly absent properties into 
one abstract property representing any property, when the number of absent 
properties is greater than a certain threshold proportional to a number of sam- 
pled properties. 


5 Evaluation 


We evaluated the sra model in two regards, (1) the feasibility of replacing 
existing manual models (RQ1 and RQ2) and (2) the effects of our heuristic H 
on the analysis soundness (RQ3). The research questions are as follow: 


— RQ1: Analysis performance of | sRA 
Can {!sra replace existing manual models for program analysis with decent 
performance in terms of soundness, precision, and runtime overhead? 
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— RQ2: Applicability of |J sRA 

Is sra broadly applicable to various builtin functions of JavaScript? 
— RQ3: Dependence on heuristic H 

How much is the performance of | sra affected by the heuristics? 


After describing the experimental setup for evaluation, we present our answers 
to the research questions with quantitative results, and discuss the limitations 
of our evaluation. 


5.1 Experimental Setup 


In order to evaluate the Į sra model, we compared the analysis performance and 
applicability of sra with those of the existing manual models in SAFE. We 
used two kinds of subjects: browser benchmark programs and builtin functions. 
From 34 browser benchmarks included in the test suite of SAFE, a subset of 
V8 Octane!, we collected 13 of them that invoke opaque code. Since browser 
benchmark programs use a small number of opaque functions, we also generated 
test cases for 134 functions in the ECMAScript 5.1 specification. 

Each test case contains abstract values that represent two or more possible 
values. Because SAFE uses a finite number of abstract domains for primitive 
values, we used all of them in the test cases. We also generated 10 abstract 
objects. Five of them are manually created to represent arbitrary objects: 


OBJ1 has an arbitrary property whose value is an arbitrary primitive. 

OBJ2 is a property descriptor whose "value" is an arbitrary primitive, and 
the others are arbitrary booleans. 

OBJ3 has an arbitrary property whose value is OBJ2. 

OBJ4 is an empty array whose "length" is arbitrary. 

OBJ5 is an arbitrary-length array with an arbitrary property 


The other five objects were collected from SunSpider benchmark programs 
by using Jalangi2 [20] to represent frequently used abstract objects. We counted 
the number of function calls with object arguments and joined the most used 
object arguments in each program. Out of 10 programs that have function 
calls with object arguments, we discarded four programs that use the same 
objects for every function call, and one program that uses an argument with 
2500 properties, which makes manual inspection impossible. We joined the first 
10 concrete objects for each argument of the following benchmark to obtain 
abstract objects: 3d-cube.js, 3d-raytrace.js, access-binary-trees.js, regexp-dna.js, 
and string-fasta.js. For 134 test functions, when a test function consumes two 
or more arguments, we restricted each argument to have only an expected type 
to manage the number of test cases. Also, we used one or minimum number of 
arguments for functions with variable number of arguments. 

In summary, we used 13 programs for RQ1, and 134 functions with 1565 test 
cases for RQ2 and RQ3. All experiments were on a 2.9 GHz quad-core Intel Core 
i7 with 16GB memory machine. 


1 https: //github.com/chromium/octane. 
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5.2 Answers to Research Questions 


Answer to RQ1. We compared the precision, soundness, and analysis time of 
the SAFE manual models and the {sra model. Table 1 shows the precision and 
soundness for each opaque function call, and Table 2 presents the analysis time 
and number of samples for each program. 

As for the precision, Table 1 shows that {}sr4 produced more precise results 
than manual models for 9 (19.6%) cases. We manually checked whether each 
result of a model is sound or not by using the partial order function (E) imple- 
mented in SAFE. We found that all the results of the SAFE manual models for 
the benchmarks were sound. The sra model produced an unsound result for 
only one function: Math.random. While it returns a floating-point value in the 
range [0,1), J sra modeled it as NUInt, instead of the expected Number, because 
it missed 0. 

As shown in Table 2, on average sra took 1.35 times more analysis time 
than the SAFE models. The table also shows the number of context-sensitive 
opaque function calls during analysis (#Call), the maximum number of samples 
(#Max), and the total number of samples (#Total). To understand the runtime 
overhead better, we measured the proportion of elapsed time for each step. On 
average, Sample took 59%, Run 7%, Abstract 17%, and the rest 17%. The exper- 
imental results show that Į sra provides high precision while slightly sacrificing 
soundness with modest runtime overhead. 


Answer to RQ2. Because the benchmark programs use only 15 opaque functions 
as shown in Table 1, we generated abstracted arguments for 134 functions out 
of 169 functions in the ECMAScript 5.1 builtin library, for which SAFE has 
manual models. We semi-automatically checked the soundness and precision of 
the sra model by comparing the analysis results with their expected results. 
Table3 shows the results in terms of test cases (left half) and functions (right 
half). The Equal column shows the number of test cases or functions, for which 
both models provide equal results that are sound. The SRA Pre. column shows 
the number of such cases where the | sra model provides sound and more precise 
results than the manual model. The Man. Uns. column presents the number 
of such cases where Į sra provides sound results but the manual one provides 
unsound results, and SRA Uns. shows the opposite case of Man. Uns. Finally, 
Not Comp. shows the number of cases where the results of sra and the 
manual model are incomparable. 

The Į sra model produced sound results for 99.4% of test cases and 94.0% 
of functions. Moreover, {sra produced more precise results than the manual 
models for 33.7% of test cases and 50.0% of functions. Although Į sra pro- 
duced unsound results for 0.6% of test cases and 6.0% of functions, we found 
soundness bugs in the manual models using 1.3% of test cases and 7.5% of func- 
tions. Our experiments showed that the automatic |}gr4 model produced less 
unsound results than the manual models. We reported the manual models pro- 
ducing unsound results to SAFE developers with the concrete examples that 
were generated in the Run step, which revealed the bugs. 
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Table 1. Precision and soundness by functions in the benchmarks 


Function Precision and Soundness 
Equal Precise} More Precise Unsound 
Array, Array.prototype.join, Array.prototype.push 15 5 0 
Date, Date.prototype.getTime 4 0 
Error 5 0 0 
Math.cos, Math.max, Math.pow, Math.sin, Math.sqrt 11 0 0 
Math. random 0 0 1 
Number .prototype.toString 1 0 0 
String, String.prototype.substring 4 0 0 
Total 36 9 1 
Proportion | 78.3% 19.6% 2.2% 
Table 2. Analysis time overhead by programs in the benchmarks 
Program Manual Ysra Increased 
Time(ms) #Call|Time(ms) #Call #Max #Total| Time Ratio 
3d-morph.js 1,423 50 2,641 50 16 408 1.86 
access-binary-trees.js | 1,926,132 10| 1,784,866 10 16 95 0.93 
access-fannkuch.js 1,615 31 2,627 31 15 413 1.63 
access-nbody.js 10,125 132 25,564 324 16 4,274 2.52 
access-nsieve.js 1,019 6 1,126 6 16 54 1.10 
bitops-nsieve-bits.js 282 1 343 1 2 2 1.22 
math-cordic.js 574 2 662 2 2 4 1.15 
math-partial-sums.js 1,613 99 4,703 99 16 916 2.92 
math-spectral-norm.js 10,702 6 10,986 6 16 96 1.03 
string-fasta.js 22,170 78 6,147 30 226 2,555 0.28 
navier-stokes.js 4,662 20 5,104 20 2 40 1.09 
richards.js 86,013 85 88,902 85 54 4,018 1.03 
splay.js 259,073 423| 217,863 422 56 11,492 0.84 
Total 2,325,404 943| 2,151,533 1,086 453 24,367 1.35 


Answer to RQ3. The sampling strategy plays an important role in the per- 
formance of {Į sra especially for soundness. Our sampling strategy depends on 
two factors: (1) manually sampled sets via the heuristic H and (2) each-used or 
pair-wise selection for object samples. We used manually sampled sets for three 
abstract values: UInt, NUInt, and OtherStr. To sample concrete values from 
them, we used three methods: Base simply follows the guidelines described in 
Sect. 3.1, Random generates samples randomly, and Final denotes the heuristics 
determined by our trials and errors to reach the highest ratio of sound results. 
For object samples, we used three pair-wise options: HeapPair, ThisPair, and Arg- 
Pair. For various sampling configurations, Table 4 summarizes the ratio of sound 
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Table 3. Precision and soundness for the builtin functions 


#Test Case #Function 
Object SRA Man. Man. SRA Not SRA Man. Man. SRA Not | Total 
Equal Total || Equal 
Pre. Uns. Pre. Uns. Comp. Pre. Uns. Pre. Uns. Comp. 
Array 59 144 1 0 (0) O| 174 8 Gs 1 0 0 0 16 
Boolean 37 2 3 0 0 0 42 1 0 3 0 0 0 4 
Date 74 241 0 2 1 1 319 8 35 0 2 1 1 AT 
Global 7 1 0 0 0 0 8 1 1 0 0 0 0 2 
Math 106 5 0 0 6 oj 117 11 2 0 0 5 1 18 
Number 41 71 (0) 3 0 1) 116 1 6 0 0 0 0 8 
Object 370 24 7 1 3 5| 410 12 2 5 0 2 0 21 
String 300 70 9 0 0 0| 379 3 14 1 0 0 0| 18 
Total 994 528 20 6 10 7| 1565 45 67 10 2 8 2| 134 
Proportion ||63.5% 33.7% 1.3% 0.4% 0.6%  0.4%|100%||33.6% 50.0% 7.5% 1.5% 6.0%  1.5%|100% 


Table 4. Soundness and sampling cost for the builtin functions 


Sampling Configuration Builtin Function 

Set Heuristic Fait Option Sound Result Ratio | #Ave. #Max 
UInt NUInt | Other ||HeapPair| ThisPair ArgPair 

Base Base Base F F F 85.0% 17.4 41 

Random | Random | Random F F F 84.9% 17.4 41 

F F F 92.1% 32.6 98 

F F T 93.5% 38.1 226 

F T F 95.0% 181.9 4312 

Final Final Final F T T 95.5% 276.8 11752 

T F F 96.2% 323.0 7220 

T F T 97.4% 397.5 16498 

T T F 99.2% 513.7 11988 

T T T 99.4% 677.6 16498 


results, the average and maximum numbers of samples for the test cases used in 
RQ2. 

The table shows that Base and Random produced sound results for 85.0% 
and 84.9% (the worst case among 10 repetitions) of the test cases, respectively. 
Even without any sophisticated heuristics or pair-wise options, Į sra achieved 
a decent amount of sound results. Using more samples collected by trials and 
errors with Final and all three pair-wise options, Į sra generated sound results 
for 99.4% of the test cases by observing more behaviors of opaque code. 


5.3 Limitations 


A fundamental limitation of our approach is that the J sra model may produce 
unsound results when the behavior of opaque code depends on values that J) sr 
does not support via sampling. For example, if a sampling strategy calls the Date 
function without enough time intervals, it may not be able to sample different 
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results. Similarly, if a sampling strategy does not use 4-wise combinations for 
property descriptor objects that have four components, it cannot produce all the 
possible combinations. However, at the same time, simply applying more complex 
strategies like 4-wise combinations may lead to an explosion of samples, which 
is not scalable. 

Our experimental evaluation is inherently limited to a specific use case, which 
poses a threat to validity. While our approach itself is not dependent on a particu- 
lar programming language or static analysis, the implementation of our approach 
depends on the abstract domains of SAFE. Although the experiments used well- 
known benchmark programs as analysis subjects, they may not be representative 
of all common uses of opaque functions in JavaScript applications. 


6 Related Work 


When a textual specification or documentation is available for opaque code, 
one can generate semantic models by mining them. Zhai et al. [26] showed that 
natural language processing can successfully generate models for Java library 
functions and used them in the context of taint analysis for Android applications. 
Researchers also created models automatically from types written in WebIDL or 
TypeScript declarations to detect Web API misuses [2, 16]. 

Given an executable (e.g. binary) version of opaque code, researchers also 
synthesized code by sampling the inputs and outputs of the code [7, 10,12,19]. 
Heule et al. [8] collected partial execution traces, which capture the effects of 
opaque code on user objects, followed by code synthesis to generate models from 
these traces. This approach works in the absence of any specification and has 
been demonstrated on array-manipulating builtins. 

While all of these techniques are a-priori attempts to generate general- 
purpose models of opaque code, to be usable for other analyses, researchers 
also proposed to construct models during analysis. Madsen et al.’s approach [14] 
infers models of opaque functions by combining pointer analysis and use anal- 
ysis, which collects expected properties and their types from given application 
code. Hirzel et al. [9] proposed an online pointer analysis for Java, which handles 
native code and reflection via dynamic execution that ours also utilizes. While 
both approaches use only a finite set of pointers as their abstract values, ignoring 
primitive values, our technique generalizes such online approaches to be usable 
for all kinds of values in a given language. 

Opaque code does matter in other program analyses as well such as model 
checking and symbolic execution. Shafiei and Breugel [22] proposed jpf-nhandler, 
an extension of Java PathFinder (JPF), which transfers execution between JPF 
and the host JVM by on-the-fly code generation. It does not need concretization 
and abstraction since a JPF object represents a concrete value. In the context 
of symbolic execution, concolic testing [21] and other hybrid techniques that 
combine path solving with random testing [18] have been used to overcome the 
problems posed by opaque code, albeit sacrificing completeness [4]. 

Even when source code of external libraries is available, substituting exter- 
nal code with models rather than analyzing themselves is useful to reduce time 
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and memory that an analysis takes. Palepu et al. [15] generated summaries by 
abstracting concrete data dependencies of library functions observed on a train- 
ing execution to avoid heavy execution of instrumented code. In model check- 
ing, Tkachuk et al. [24,25] generated over-approximated summaries of environ- 
ments by points-to and side-effect analyses and presented a static analysis tool 
OCSEGen [23]. Another tool Modgen [5] applies a program slicing technique to 
reduce complexities of library classes. 


7 Conclusion 


Creating semantic models for static analysis by hand is complex, time-consuming 
and error-prone. We present a Sample-Run-Abstract approach (sra) as a 
promising way to perform static analysis in the presence of opaque code using 
automated on-demand modeling. We show how Į sra can be applied to the 
abstract domains of an existing JavaScript static analyzer, SAFE. For bench- 
mark programs and 134 builtin functions with 1565 abstracted inputs, a tuned 
sra produced more sound results than the manual models and concrete exam- 
ples revealing bugs in the manual models. Although not all opaque code may be 
suitable for modeling with |} sra, it reduces the amount of hand-written models 
a static analyzer should provide. Future work on JĮ} sra could focus on orthogonal 
testing techniques that can be used for sampling complex objects, and practical 
optimizations, such as caching of computed model results. 
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Abstract. The Clock Constraint Specification Language (CCSL) is a 
formalism for specifying logical-time constraints on events for the design 
of real-time embedded systems. A central verification problem of CCSL 
is to check whether events are schedulable under logical constraints. 
Although many efforts have been made addressing this problem, the 
problem is still open. In this paper, we show that the bounded schedul- 
ing problem is NP-complete and then propose an efficient SMT-based 
decision procedure which is sound and complete. Based on this deci- 
sion procedure, we present a sound algorithm for the general scheduling 
problem. We implement our algorithm in a prototype tool and illustrate 
its utility in schedulability analysis in designing real-world systems and 
automatic proving of algebraic properties of CCSL constraints. Experi- 
mental results demonstrate its effectiveness and efficiency. 


Keywords: SMT - CCSL - Schedulability - Logical time - 
Real-time system 


1 Introduction 


Model-based design has been widely used, particularly in the design of safety- 
critical real-time embedded systems. It has achieved industrial successes through 
languages such as SCADE [12], AADL [15] and UML MARTE [26]. For example, 
UML MARTE provides syntactic annotations to implement, when the context 
allows, classical real-time scheduling algorithms such as EDF (Earliest Deadline 
First). It also provides a domain-specific language—Clock Constraint Specifica- 
tion Language (CCSL) [3], to express the real-time behaviors of a system under 
development as logical constraints on system events, but independently of any 
physical time and classical real-time scheduling algorithms. CCSL has been used 
on several industrial scenarios such as vehicle systems [16] and cyber-physical 
systems [10,22]. 
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Model-based design usually starts with coarse-grained logical models that are 
progressively refined into more concrete ones until the final code deployment. It 
is well-known that the earlier one can detect and fix bugs in the refinement pro- 
cess, the better [7]. Therefore, it is critical to provide efficient methods and tools 
to check safety, liveness and schedulability on the logical models and not only 
on the definite deployed system. This has motivated a large body of works on 
verifying whether events are schedulable under a set of constraints expressed in 
CCSL [11, 21, 28,33, 35,36,38], though its decidability is still open. These works 
first transform CCSL constraints into other formal representations such as tran- 
sition systems [21], Promela [35], Biichi automata [36], timed automata [33], 
rewriting logics [38], instant relations [28], or timed-interval logics [11], and then 
apply existing tools. However, their approaches usually suffer from the state 
explosion problem. Moreover, most of these works only deal with the so-called 
safe subset of CCSL and the other ones only provide semi-algorithms. In our 
earlier work [39], we proposed an SMT-based verification approach to CCSL and 
demonstrated several applications of the approach to finding schedules, verifying 
temporal properties, proving constraint entailment, and analyzing the validity 
of system traces. Based on the approach, we implemented an efficient tool for 
verifying LTL properties of CCSL [40]. 

In this work we are focused on the scheduling problem of CCSL, a funda- 
mental problem to which the aforementioned verification problems of CCSL can 
be reduced. We first prove that the bounded scheduling problem of CCSL with 
fixed bounds is NP-complete. To our knowledge, this is the first result regard- 
ing the complexity of the scheduling problem with CCSL. Then, we propose a 
decision procedure for the bounded scheduling problem with a given bound. The 
decision procedure is based on the transformation of CCSL into SMT formulas 
[39]. Our decision procedure is sound, complete, and efficient in practice. Based 
on this decision procedure, we turn to the general (i.e. unbounded) scheduling 
problem and present a binary-search based algorithm. Our algorithm is sound, 
i.e., if it proves either schedulable or unschedulable, then the result is conclusive. 
We implemented our algorithms in a prototype tool. The tool was used to ana- 
lyze a real-world interlocking system in a rail transit system. Using the proposed 
approach, we also prove some algebraic properties of CCSL. The experimental 
results demonstrate the effectiveness and efficiency of the SMT-based approach. 

The rest of this paper is organised as follows: Section 2 introduces CCSL. 
Section 3 defines the (bounded) scheduling problem of CCSL and shows that the 
bounded case is NP-complete. Section 4 presents an SMT-based decision proce- 
dure for the bounded scheduling problem and a sound algorithm for the gen- 
eral scheduling problem. Section 5 shows a case study and experimental results. 
Section 6 discusses related work, and Section 7 concludes the paper. 


2 The Clock Constraint Specification Language 


2.1 Logical Clock, History and Schedule 


In CCSL, clocks are used to model occurrences of events, where a clock ticks 
when the corresponding event occurs. For instance, a clock may represent an 
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event that is dispatch of a task, communications between tasks or acquisition of 
a shared resource by a task. Constraints over clocks are used to specify causal 
and temporal relations between system events. No global physical time is pre- 
sumed for the clocks and their constraints. This feature allows CCSL to define 
a polychronous specification of a system at a logical level. 


Definition 1 (Logical clock). A (logical) clock c is an infinite sequence of 
ticks (c)ien+ with each c' being tick or idle, where N+ denotes the set of all the 
non-zero natural numbers. 


The value of c’ denotes whether an event associated with c occurs or not at 
step i. If c’ is tick, then the event occurs, otherwise not. In particular, we denote 
by 1 a global reference logical clock that always ticks at each step. 


Definition 2 (Schedule). Given a set C of clocks, a schedule of C is a total 
function 6: N+ — 2° such that Vi € Nt, 6(i) = {c € C | £ = tick} and 6(i) £ 0. 


Intuitively, a schedule 6 defines a partial order between the ticks of the clocks. 
ô(i) is a subset of C such that c € ô(i) iff c ticks at step i. The condition 
6(t) Æ @ expresses that step i cannot be empty. This forbids stuttering steps in 
schedules. As one can add or remove finite number of empty steps without effect 
on schedulability, we exclude them from schedules for succinctness. 

A clock can memorize the number of ticks that it has made. We use history 
to represent the memorization. 


Definition 3 (History). Given a schedule 6 for a set C of clocks, a history of 
ô is a function xs : C x Nt — N such that for each c€ C andi € Nt: 


0, ifi=1; 
Xs (c, i) = x(c i— 1), if i > 1A^Acgòô(li— 1); 
xslc,i—1)+1,ifi>1^AceEô(i— 1). 


x(c, i) represents the number of the ticks that the clock c has made immediately 
before step i. (Note that the tick of c at step i is excluded in xs(c,i).) For 
simplicity, we may write x for xs if it is clear from the context. 


2.2 Syntax and Semantics of CCSL 


CCSL consists of 11 kinds of constraints, 4 of them are binary relations for 
specifying the precedence, causality, subclocking, and exclusion relations between 
clocks, and the others are used to define clocks from existing ones. Clocks defined 
by constraints may correspond to system events or are just introduced as auxil- 
iary clocks without corresponding to any events. 
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Table 1. Semantics of CCSL with respect to schedules 


$ ôe 

Precedence |c1 [b] < c2 Yn € Nt.x(c2,n) — x(c1,n) = b > co ¢ ô(n) 

Causality |c1 x c2 Yn E€ Nt.x(c1,n) > xle2,n) 

Subclock c1 Ce Vn te, € b(n c2 € 6(n) 

Exclusion |c1 # c2 Yn E€ Nt.cy € 6(n) V co € ôln) 

Union c1 Ê c2 + c3 Yn te, € ô(n c2 € 6(n) V cg E ô(n) 

Intersection | c1 Ê c2 * c3 Yn tci € d(n c2 € 6(n) A c3 E ô(n) 

Infimum c1 Ê c2 A C3 Yn € Nt.x(c1,n) = max(x(c2, n), x(c3;n)) 

Supremum |c1 Ê c2 V c3 yn € N+.x(c1, n) = min(x(c2, n), x(c3,7)) 

Periodicity |c1 c2 x p Yn te, € d(n (c2 € 6(n) A Im E Nt. x(c2,n) = 
m X p—1) 

Filtering aja Vw Vn te, € d(n (c2 € 6(n) A w[n]) 

DelayFor |c1 = c2 $ d on c3 |Yn € Nt .c1 € d(n (c3 € 6(n) A Im E NH. (c2 € 
ô(m) A x(c3, n) — x(c3, m) = d)) 


Definition 4 (Syntax). A CCSL constraint ¢ is defined by the following form: 


Precedence: c1 |b] < c2 | Causality: cı % c2 
Subclock: cı C cg | Exclusion: cı # c2 
Union: a co ++ c | Intersection: c, Ê cg * c3 
a A A 
Infimum: cy, = c2 A C3 | Supremum: cı = c2 V c3 
Periodicity: c3 = cz « p | Filtering: c Ê c&2 Y w 
A 


DelayFor: cı Ê cə $ d on c3 


where b > 0, d > 0 and p > 0 are natural numbers, c1, C2, c3 are logical clocks and 
w is a (possibly infinite) word over {0,1} expressed as a (w-)regular expression. 


For simplifying presentation, we denote by cı < cy the constraint c1 [0] < c2, 
and cı Ê c2 $ d the constraint c1 £ cz $ d on cg such that c2 = c3. 

The semantics of CCSL constraints is defined over schedules. Given a CCSL 
constraint ¢ and a schedule 6, the satisfiability relation 6 E ¢ (i.e., 6 satisfies 
constraint @) is defined in Table 1. 

The precedence constraint cı < cz (i.e., cı [0] < c2) expresses that the clock 
cı precedes the clock cg. Suppose there is an unbounded buffer with two opera- 
tions fetch and store, which respectively fetch data from and store data into the 
buffer. Fetch is only allowed when the buffer is nonempty. If the buffer is initially 
empty, store operation must strictly precede fetch operation. This behavior can 
be expressed by the constraint: store < fetch. Likewise, the precedence con- 
straint can be used to represent reentrant tasks by replacing store with start 
and fetch with finish. 

The general precedence constraint cı [b] < c2 that can specify the differences 
b between the number of occurrences of two clocks before the precedence takes 
effect. Hence, it is able to express more complicated relations. For instance, if 
the buffer initially is nonempty, fetch operations can be performed prior to any 
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store operation. Figure 1 shows such a scenario where 4 elements are initially 
presented in the buffer. This behavior can be represented as: store [4] < fetch. 

The causality, subclock and exclu- 
sion constraints are straightforward. store 
The causality constraint cy ~< C2 
specifies that the occurrence of cg buffer 
must be caused by the occurrence of 
cı, namely at any moment cı must Fig. 1. Example for store [4] < fetch 
have ticked at least as many times as 
c2 has. The subclock constraint cı C c2 expresses that cı occurs at some step 
only if c2 occur at this step as well. The exclusion constraint cı # c2 specifies 
that two clocks cı and c2 are exclusive, i.e., they cannot occur simultaneously 
at the same step. 

The union and intersection constraints are used to define clocks. c1 £ c2 + c3 
defines a clock c1 such that c; ticks iff c2 or c3 ticks. Similarly, c1 £ c2 * c3 defines 
a clock cı such that c; ticks iff both c2 and c3 tick. The infimum (resp. supremum) 
constraint c1 Ê c2 A c3 (resp. cy £ c2 V c3) is used to define a clock cı that is 
the slowest (resp. fastest) clock that is faster (resp. slower) than both c2 and c3. 
These two constraints are useful for expressing delay requirements between two 
events. Remark that clocks cı defined by constraints may correspond to system 
events, otherwise are auxiliary clocks. In the former case, these constraints can 
be seen as constraints specifying relations between clocks c1, c2 and c3. 

The periodicity constraint c1 £ cy x p defines a clock cı such that cı has to 
be performed once every p occurrences of clock c2. It is worth mentioning that 
the periodicity constraint defined in such a way is relative because of the logical 
nature of CCSL clocks. That is, clock cı is relatively periodic with respect to 
clock c2. CCSL does not assume the existence of a global reference clock, most 
relations are defined relative to other clocks. These notions extend the equivalent 
behaviors which are usually defined relative to physical time. If cy represents a 
sensor that measures physical time, then cı becomes physically periodic. 

The filtering constraint cı Ê c2 V w is used to define a clock cı which can 
be seen as snapshots of the clock c2 at some steps according to the (w-)regular 
expression w. For instance, c1 = c2 V (01)” expresses that cı simulates cz at 
every even step. It defines a logically periodic behavior of cı with respect to co. 

The delayFor constraint cı Ê cz $ d (i.e., c1 £ c2 $ d on c2) defines a new 
clock cı that is delayed by the clock cz with d steps. The general form cı £ 
co $ d on c3 defines a new clock cı that is delayed by co with d times of the ticks 
of c3. cy can be seen as a sampled clock of c2 on the basis of c3. For instance, 
a & c $lon c3, denotes that whenever c2 ticks at least once between two 
successive ticks of c3 at steps m and n, cı must tick at step n. 


fetch 


3 Scheduling Problem of CCSL 


3.1 Schedulability 


Given a set & of CCSL constraints, a schedule 6 satisfies 6, denoted by 6 = P, 
iff 6 — ¢ for all constraints ¢ € ©. 
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Fig. 2. The unique schedule that satisfies the three constraints in the example 


Definition 5 (Logical time scheduling problem). Given a set ® of CCSL 
constraints, the (logical time) scheduling problem of CCSL is to determine 
whether there exists a schedule 6 such that ô = ®. 


We illustrate the scheduling problem by a simple example. Consider alter- 
native flickering between the green and red light using CCSL. We assume that 
green light starts first. The timing requirements can be formalized by the fol- 
lowing three constraints: 


green < red, tmp = green $ 1, red < tmp, 


where green and red are clocks respectively representing whether the green (resp. 
red) light is turned on, the clock tmp is an auxiliary clock used to help specify 
the constraints on clocks. 

There exists exactly one schedule satisfying the three constraints, as shown 
in Fig. 2. In this schedule, the clock tmp has the same behavior as green from 
step 2, while the clock red has the opposite behavior to green. Namely, red and 
green operates in an alternative manner. For simplicity, we also write green ~ red 
to denote the alternation relation of the two clocks. 

Although one may be able to find one or more schedules for some simple 
constraints, to our knowledge, there is no generally applicable decision procedure 
solving the scheduling problem of full CCSL. There are two main challenges. 
First, schedules are essentially infinite, i.e., defined on all the natural numbers. 
Second, the precedence is stateful, i.e., it depends on the history, and there is no 
upper bound on how far in the history one must go back. It may then require 
an infinite memory to store the history. As a first step to tackle this challenging 
problem, in this work, we first consider the bounded scheduling problem. 


3.2 Bounded Scheduling Problem 


Given a bound k € Nt, let o : Nt, — 2° be a function. ø is an k-bounded 
schedule of a set & of CCSL constraints, denoted by o x ®, iff there exists a 
schedule ô such that (i) = a(i) for every i € Nf, and ô |= @ from step 1 up to 
k, where NE, := {1,--- ,k}. 
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Definition 6 (Bounded scheduling problem). The bounded scheduling 
problem is to determine, for a given set ® of CCSL constraints and a bound k, 
whether there is an k-bounded schedule o for ®, i.e., o Ex P. 


Theorem 1 (Sufficient condition of unschedulability). If a set ® of con- 
straints has no k-bounded schedule for some k € N+, then © is unschedulable. 


The proof is straightforward by contradiction. 

It is easy to see that the bounded scheduling problem is decidable, as there are 
finitely many potential k-bounded schedules, i.e., (2'C! — 1)*, where |C| denotes 
the number of clocks. Furthermore, the satisfiability problem of Boolean formulas 
can be reduced to the bounded scheduling problem in polynomial time. 


Theorem 2. The k-bounded scheduling problem of CCSL is NP-complete, even 
ifk=1. 


Proof. The NP upper bound can be proved easily based on the facts that the 
number of possible k-bounded schedules is finite and the universal quantification 
Yn E N 2 , can be eliminated by enumerating all the possible values in Nt ks 

We prove the NP-hardness by a reduction from the satisfiability problem of 
Boolean formulas which is known NP-complete. Consider the Boolean formula 
$ = Mell V2 v I), where m € Nt and I? for j € {1,2,3} is either a Boolean 
variable x or its negation ~z. Let Var(@) denote the set of Boolean variables 
appearing in @. We construct a set of CCSL constraints ® as follows. 

For each x € Var(¢), we have two clocks x* and x7. Let enc(x) = xt and 
enc(>a2) = x~. Each clause J} VI? V/3 in ¢ is encoded as the CCSL constraint c; + 
enc(I}) +enc(I?)+enc(13), denoted by y. Note that c; £ enc(J})+enc(I?) +enc(/3) 
can be transformed into CCSL constraints by introducing one auxiliary clock c, 
i.e., {c; + enc(I}) + enc(I?) + enc(I3)} = {c; £ enc(I}) + c, c = enc(I?) + enc(I3)}. 

Let enc(¢) denote the following set of CCSL constraints 


{1 = #1 Ci, V1; xi eae if od = at +e | TE Var(ġ)} 


where zt # a and 1 £ zt + x enforce that either x" or x7 ticks at each 
step, but not both. This encodes that either x is true or ~z is true. Note that 
T = *™,¢; is a shorthand of T = c, * +++ * Cm, and can also be expressed in 
CCSL constraints by introducing polynomial number of auxiliary clocks. For 
instance, {c © c1 * c2 * c3} = {c £ c * d,d & cox c3}. We can show that 
¢ is satisfiable iff enc(ġ) is 1-bounded schedulable. The satisfiability problem of 
Boolean formulas is NP-complete, we get that the 1-bounded scheduling problem 
of CCSL is NP-hard. The k-bounded scheduling problem for k > 1 immediately 
follows by repeating the ticks of clocks at the first step. 


Theorem 2 indicates the time complexity of the bounded scheduling problem. 
Thus, we need to find practical solutions that are algorithmically efficient for 
it. In the next section, we propose an SMT-based decision procedure for the 
bounded scheduling problem and a sound algorithm for the scheduling problem. 
Thanks to advances in state-of-the-art SMT solvers such as Z3 [25], our approach 
is usually efficient in practice. 
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4 Decision Procedure for the Scheduling Problem 


4.1 Transformation from CCSL into SMT 


Let us fix a set of CCSL constraints @ defined over a set C of clocks. Each clock 
c € C is interpreted as a predicate te : Nt — Bool such that for all i € NT, teli) 
is true iff the clock c ticks at i, where Bool denotes Boolean sort. A schedule 6 
of @ is encoded as a set of predicates To = {t-|c E€ C} such that the following 
condition holds: for all te € To, 


Vi e Nt.t.(4) & c € d(i). 


Recalling that schedules forbid stuttering steps, this condition is enforced by 
restricting the predicates te in To to satisfy the following condition: 


Vi € NF. Veec teli) (F1) 


Formula F1 specifies that at each step i at least one clock c ticks, i.e., te(i) holds. 
For each clock c € C, we introduce an auxiliary function he : Nt — N to 
encode its history. For each i € NF, 


0, ifi= 1; 
h(i) := $ heli — 1), if i > 1A -7t.(i — 1); (F2) 
he(i— 1)+1, ifi >1At(i— 1). 


Intuitively, he(i) is equivalent to x(c, i) for each i € N+. The set of all the 
auxiliary functions is denoted by Heo. 

By replacing each occurrence of clock c in ô(n) (resp. c  ô(n)) with te(n) 
(resp. 7t.(n)) and x(c, n) with he(n) in the definition of each CCSL constraint, 
each CCSL constraint ġ can be encoded as an SMT formula [¢@]. 

We use [] to denote the conjunction of Formulas F1, F2 and the SMT 
encodings of CCSL constraints in ®. Formally, 


[9] = F1 A F2 A (Ageald]). 


Finding a schedule for amounts to finding a solution, i.e., definitions of 
predicates in 7c, which satisfies [Ð]. 


Proposition 1. & has a schedule iff [®] is satisfiable. 


The scheduling problem of ® is transformed into the satisfiability problem of 
the formula [P]. However, according to the SMT-LIB standard [4], [] belongs to 
the logic of UFLIA (formulas with Uninterpreted Functions and Linear Integer 
Arithmetic), whose satisfiability problem is undecidable in general. Nevertheless, 
the SMT encoding is still useful to solve the bounded scheduling problem, which 
we will present in the next subsection. 
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4.2 Decision Procedure for the Bounded Scheduling Problem 


For k-bounded scheduling problem, it suffices to consider schedules 6 : Nt, > 
2°. Moreover, the quantifiers in [Ø] can be eliminated once the bound k is fixed. 


Hence, we can resort to state-of-the-art SMT solvers. Formally, let [P] be the 
formula obtained from [P] = F1 A F2A (Agcell¢]) by 


— restricting the domain of predicates te € Tc and functions he E€ Hc to NE, 


— replacing quantifications Yn € N+ and Im € N* with Vn € NE g and Im € 
Nz, in (Agcol9])- 


Proposition 2. ® is k-bounded schedulable iff [©], is satisfiable. 
Moreover, if [®]x is satisfiable, then [®],: is satisfiable for all k' < k. 


4.3 A Sound Algorithm for the Scheduling Problem 


According to Theorem 1, Propositions 1 and 2, (1) if [@] is satisfiable, then ® is 
schedulable, and (2) if [®];, for some k € N* is unsatisfiable, then @ is unschedu- 
lable. We can deduce a sound algorithm for checking the general scheduling 
problem. However, randomly choosing a bound k and checking whether or not 
[S]+ is unsatisfiable may be inefficient, as the k-bounded scheduling problem is 
NP-hard (cf. Theorem 2), and larger bound k may result in time out, but smaller 
bound k may result in that [];, is satisfiable. Indeed, if we consider the maxi- 
mal bound B, then the random approach may have to call SMT solving O(B) 
times. Alternatively, we propose a binary-search based approach as shown in 
Algorithm 1 for a given maximal bound B, which invokes SMT solving at most 
O(| log, B|) times. 


Algorithm 1: A sound algorithm for the scheduling problem 

Input : a set of constraints ®, a timeout threshold T, a maximal bound B 

Output: {SAT, UNSAT, Timeout} x Nt 

result;  SMTSolver([®], T); 

if result, = SAT then /* Schedulable */ 
return (SAT, 0) 


l — 0; u — B; 
while l < u do /* Binary search */ 
ke |"); 
result — SMTSolver([®]x, T); 
if result = SAT then l — k + 1; /* Upper half */ 
else /* Lower half */ 

u-k-—-1; 

if result, = UNSAT V result2 = UNSAT then 

| result; +— UNSAT; 


COmMN OA KF UNBE 


RRR 
Ne o 


if resultz Æ SAT then k — k — l; 
return (resulti,k); 


=. e 
À w 
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Given a set & of constraints in CCSL, a timeout threshold T and a maxi- 
mal bound B, Algorithm 1 first invokes an SMTSolver to decide whether [9] is 
satisfiable or not within T time. If [Ø] is satisfiable, then Algorithm 1 returns 
(SAT,0), meaning that ® is schedulable. Otherwise, it binary searches a bound 
k < B such that [®]x is satisfiable while [®]x41 (if k +1 < B) is unsatisfiable 
or cannot be verified in time T. 


Theorem 3. Algorithm 1 has the following three properties: 


1. If it returns (SAT,0), then ® is schedulable. 

2. If it returns (UNSAT,k), then ® is unschedulable. If k 4 0, then ® has k- 
bounded schedulable, otherwise does not have any bounded schedulable. 

3. If it returns (Timeout,k), then B is k-bounded schedulable if k # 0, otherwise 
no bounded schedule is found for ®. 


5 Case Study and Performance Evaluation 


We implemented our approach in a prototype tool with Z3 [25] as its underlying 
SMT solver. We conduct a case study on expressing requirements of an inter- 
locking system in CCSL constraints and analyzing its schedulability. Then, we 
prove 12 algebraic properties of CCSL constraints using the tool. Finally, we 
evaluate the performance of the tool using 9 sets of CCSL constraints. 


5.1 Schedulability of an Interlocking System 


The interlocking system is a subsystem of a rail transit system. It is used to 
prevent trains from collisions and derailments when they are moving under the 
control of signal lights. As shown in Fig.3, the interlocking system monitors 
the occupancy status of the individ- 


ual track section, and sends signals ee 

to inform drivers whether they are a Ps 
allowed to enter the route or not. The Pa Lite A ai ae 
railway tracks are divided into sec- / a 

tions. Each section is associated with Tain a 


a track circuit for detecting whether H 
it is occupied by a train or not. Sig- 
nal lights are placed between track 
sections. They can be red and green 
to indicate proceeding and stopping, 
respectively. 

The mechanism and operation procedure of the interlocking system are sum- 
marized as follows. 


Le 


Fig. 3. Interlocking system 


1. To enter a track, a train first sends a request to the control center. 
2. On receiving the request, the control center sends an inquiry to the track 
circuit to detect the status of the track. 
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Table 2. CCSL constraints of the interlocking system 


request < inquiry 
checkFail < redPulse 
redPulse = showRed 
showRed < wait 
checkSucc < greenPulse 
greenPulse = showGreen 
showGreen < enter 
enter < leave 

enter C getOccupied 


leave C getUnoccupied 


getüccupied ~ tmp, 
getUnoccupied ~ tmp, 
checkFail C tmp; 
tmp s getOccupied 


responseOfTrack £ checkSucc + checkFail 
responseOfTrain £ enter + wait 
inquiry < responseOfTrack 
getOccupied ~ getUnoccupied 
getOccupied # getUnoccupied 
request ~ responseOfTrain 
inquiry — responseOfTrack < 40 
greenPulse — showGreen < 30 
redPulse — showRed < 30 
request — responseOfTrain < 50 
checkFail — showRed < 40 
checkSucc — showGreen < 40 


getUnoccupied ~ tmpe 
checkSucc C tmp2 


3. If the track is occupied, it sends checkFail to the control center, and otherwise 


checkS'ucc. 


4. On receiving the message checkFail (resp. checkSucc), the control center sends 
a red (resp. green) signal pulse to the signal light. 
5. The signal light turns red (resp. green) on receiving the red (resp. green) 


signal pulse. 


6. The train will enter after seeing the light is green, and the track becomes 


occupied. In case of the red light, the train must stop and wait. 
7. The track becomes unoccupied after the train leaves. If the train is waiting, 
it must send a request again after some time. 


There are time constraints on the above operations. For instance, the control 
center needs to get a response from the track circuit within 30ms after sending 
an inquiry to it. The train must make decision within 50ms after it sends a 
request to the control center. The light should turn to the corresponding color 
within 30 ms after it receives a pulse. After the track becomes occupied (resp. 
unoccupied), the light must turn red (resp. green) within 40 ms. 

Table 2 shows the main logical constraints on the operations in the system 
and their timing constraints. We use some non-standard constraint expressions 
for the sake of compactness. Constraint a — b < n denotes that b must tick 
within n steps after a ticks. It equals the set of the following three constraints: 


axb, t2a$non1, bt. 


Note that in this example the unit of time is millisecond (ms). Thus, there is an 
implicit assumption in the constraints that every tick of a logic clock means the 
elapse of one millisecond. 
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Fig. 4. A bounded schedule for the CCSL constraints in the case study 


Most constraints in Table2 are straightforward, except the six con- 
straints marked with waved underlines. The first three constraints specify that 
checkFail only can occur between the occurrences of getUnoccupied and 
getOccupied. The others specify the following two requirements: 


1. checkSucc only can occur after getUnoccupied and before getOccupied; 
2. getUnoccupied precedes getOccupied. 


Given these constraints, our tool found a bounded schedule as depicted in 
Fig. 4. From step 1 to step 7, one complete process is finished. Initially, the 
track gets unoccupied. At step 2, a request is made, which causes subsequent 
operations to occur from step 3 to step 7. At step 29, a fail case occurs because 
another train enters (step 26) but has not left (step 31). The train that made 
the request has to wait (step 33). 

If we extend the bounded schedule by infinitely repeating the behaviors of all 
the clocks between step 51 and 69 from step 70, we obtain an infinite schedule. 
The extended schedule satisfies all the constraints, and thus it is a witness of 
the schedulability of designed mechanism for the interlocking system. 

In this paper, we are only concerned with the schedulability of the constraints 
in the example. Some other kinds of temporal properties also need to verify. For 
instance, we must guarantee that whenever a train requests to enter the station, 
it must eventually enter. We also need to verify the system is deadlock-free. Such 
temporal properties can be verified by LTL model checking of CCSL constraints 
using SMT technique [40]. We omit it because it is beyond the scope of this paper. 


5.2 Automatic Proof of CCSL Algebraic Properties 


Using the proposed approach, we can also prove automatically algebraic prop- 
erties of CCSL constraints such as the commutativity of exclusion and transi- 
tivity of causality. Algebraic properties of CCSL constraints can be represented 
as ® => ¢, where @ is a set of CCSL constraints and ¢ is a constraint derived 
from ®. Proving ® => ¢ is valid equals proving the unsatisfiability of [S] A -[¢], 
which can be solved by Algorithm 1. 
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Table 3. Proved algebraic properties of CCSL constraints 


Algebraic property Definition 

Commutativity of exclusion cl # c2 > c2 # cl 

Transitivity of causality C1 X C2 , C2 X C3 > C1 <c3 
Antisymmetry of causality | c1 $ C2 , C2 x C1 > C1 = C2 

Fastness of infimum cı £ c2 A G3 > c g C2, C1 $ C3 
Slowestness of infimum c1 & co A C3, C4 X C2, C4 X03 > C4 x C1 
Slowness of supremum ca Ê c2 V C3 > © x c1, C3 x C1 
Fastestness of supremum a £c V C3, C2 3 C4, C3 $ Ca > C1 XS Ca 
Causality of subclock Cı C C2 > C2 s C 

Causality of union c £ ceo +e > c g C2, C1 x C3 
Causality of intersection C1 Ê C2 * C3 > C2 S C1, C3 3 C1 
Subclocking of sampling ca Ê c 4 e> aCe 

Subclocking of union cı Ê c2 + C3 tc C &, C3 Ca 
Subclocking of intersection c1 Ê C2 * c3 > c1 C 2,61 C cg 


Let us consider the proof of the slowestness of infimum as an example. The 
slowestness of infimum means that an infimum constraint c; £ c2 A c3 defines 
the slowest clock cı among those that are faster than both c2 and c3. 


Proposition 3 (Slowestness of infimum). Given two clocks c2,c3, let cy 4 
C2 A c3 and c4 be an arbitrary clock such that c4 x c2 and c4 =X c3, then c4 =X c1. 


This is proved by transforming CCSL constraints into the following SMT for- 
mula according the SMT encoding method: 


[er = c2 A eg] A [ea X ce] A [ea 3 c3] A aca s ca]. 


Algorithm 1 returns (UNSAT, 0), which means that the formula is proved unsat- 
isfiable. The proposition is proved. 

Table 3 lists the algebraic properties that have been successfully proved in 
our approach. Algebraic properties are useful to help understand the relation 
among CCSL constraints. Using them we can also verify whether some CCSL 
constraints are redundant or inconsistent for a given set of CCSL constraints. 


5.3 Performance Evaluation 


To evaluate the performance our tool, we collected 9 sets of CCSL constraints 
from the literature and real-world applications, and analyzed their schedulability 
using our tool. Under different time thresholds, we calculate the maximal bounds 
under which the constraints are schedulable. 

Table 4 shows all the experimental results including the corresponding exe- 
cution time. All the experiments were conducted on a Win 10 running on an 
i7 CPU with 2.70GHz and 16GB memory. The numbers followed by asterisks 
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Table 4. Experimental results of bounded schedulability analysis 


CS Clks. Cons. | THD: 10s} THD: 20s THD: 30s| THD: 40s 
BD TM/BD TM BD TM/BD TM 


CS1 3 3 8 0.06; 8 006, 8 0.06) 8 0.06 
CS2 3 4 2* 0.06| 2* 0.06 2* 0.06) 2* 0.06 
CS3 8 9 48 6.20) 59 15.88] 70 28.72| 75 39.82 
CS4 8 T 70 7.12} 70 7.12) 70 7.12) 70 7.12 
CS5 9 9 80 8.29| 90 19.95 |110 26.81|111 39.84 
CS6 10 6 95 9.40|113 14.26 113 14.26 |113 14.26 
CS7 12 9 69 8.80| 76 19.42| 89 27.69| 95 40.00 


CS8 17 20 16 0.81) 16 0.81 16* 27.36 | 16* 27.36 
CS9 27 51 30 9.94} 41 17.19} 45 29.78] 45 29.78 
Remarks: CS: constraint set, Cons: the number of constraints, 
Clks: the number of clocks, THD: timeout threshold, TM: Time 
(second), BD: upper bound. 


are the maximal bounds such that the corresponding constraints are bounded 
schedulable, but unschedulable in the next step. It is interesting to observe from 
Table 4 that time cost is loosely related to size (the number of clocks and con- 
straints), thanks to efficient search strategies of SMT solvers. This is in striking 
contrasts to automata-based [29,35] and the rewriting-based approaches [38], 
whose scalability suffers from both the numbers of clocks and constraints. 


6 Related Work 


CCSL is directly derived from the family of synchronous languages, such as 
Lustre [9], Esterel [6] and Signal [5], and its the scheduling problem of CCSL 
is akin to what synchronous languages call clock calculus. The main differences 
are: CCSL is a specification language, while others are programming languages; 
and CCSL partially describes what is expected to happen in a declarative way 
and does not give a direct operational deterministic description of what must 
happen. Furthermore, CCSL only deals with pure clocks while the others deal 
with signals and extract the clocks when needed. 

The Esterel compiler [31] applies a constructive approach to decide when a 
signal must occur (compute its clock) and what its value should be. This requires 
a detection of causality cycles, or intra-cycle data dependencies, which are also 
naturally addressed by our approach. However, the Esterel compiler compiles an 
imperative program into a Boolean circuit, or equivalently a finite state machine. 
Consequently, it cannot deal with CCSL unbounded schedules. 

The clock calculus in Signal attempts to detect whether the specification is 
endochronous [30], in which case it can generate some efficient code. This analysis 
is mainly based on the subclock relationship that also exists in CCSL. In CCSL, 
we consider the problem whether there is at least one possible schedule or not. 
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In Lustre and its extensions, clocks are regarded as abstract types [13] and the 
clock calculus computes the relative rates of clocks while rejecting the program 
when computing the rates is not possible. In most cases, the compiler attempts 
to build bounded buffers and to ensure that the functional determinism can be 
preserved with a finite memory. In our case, we do not seek to reach a finite 
representation, as in the first specification steps this is not a primary goal for 
the designers. Indeed, this might lead to an over-specification of the problem. 

Classical real-time scheduling problem [32] usually relies on task models, 
arrival patterns and constraints (e.g., precedence, resources) to propose algo- 
rithms for the scheduling problem with analytical results [19] or heuristics 
depending on the specific model (e.g., priorities, preemptive). Other solutions, 
based on timed automata [1,2,17] or timed Petri nets [8,18], propose a general 
framework for describing all the relevant aspects without assuming a specific task 
model. CCSL offers an alternative method based on logical time. It is believed 
that logical time and multiform time bases offer some flexibility to unify func- 
tional requirements and performance constraints. We rely on CCSL and we 
claim that after encoding a task model in CCSL, finding a schedule for the 
CCSL model also gives a schedule for the encoded task model [24]. 

There have been many efforts made towards the scheduling problem of 
CCSL, though no conclusion is drawn on its decidability. TIMESQUARE [14] 
is a simulation tool for CCSL which can produce a possible schedule for a given 
set of CCSL, up to a given user-defined bound. It also supports different sim- 
ulation strategies for producing desired execution traces. Some earlier work [20] 
define the notion of safe CCSL specifications that can be encoded with a finite- 
state machine. The scheduling problem is decidable for safe specifications, as one 
can merely enumerate all the (finite) solutions. A semi-algorithm can build the 
finite representation when the specification is safe [21]. In [37], Zhang et al. pro- 
posed a state-based approach and a sufficient condition to decide whether safe 
and unsafe specifications accept a so-called periodic schedule [39]. This allows to 
build a finite solution for unsafe specifications, while there may also exist infi- 
nite solutions. Xu et al. proposed a notion of divergence of CCSL to study the 
schedulability of CCSL, and proved that a set of CCSL constraints is schedula- 
ble if all the constraints are divergent [34]. They resorted to the theorem prover 
PVS [27] to assist the divergence proof. 

The scheduling problem of CCSL constraints in this work resorts to SMT 
solving to deal with the bounded and unbounded schedules. Using SMT solving 
has two advantages: (1) it is usually efficient in practice, and (2) it can deal with 
unsafe CCSL constraints such as infimum and supremum [21]. 

Some basic algebraic properties on CCSL relations have been established 
manually before [23] but we provide here an automatic framework to do so. 


7 Conclusion and Future Work 


In this work, we proved that the bounded scheduling problem of CCSL is 
NP-complete, and proposed an SMT-based decision procedure for the bounded 
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scheduling problem. The procedure is sound and complete. The experimental 
results also show its efficiency in practice. Based on this decision procedure, we 
devised a sound algorithm for the general scheduling problem. We evaluated the 
effectiveness of the proposed approach on an interlocking system. We also showed 
our approach can be used to prove algebraic properties of CCSL constraints. 

Our approach to the bounded scheduling problem of CCSL makes us one 
step closer to tackling the general (i.e. unbounded) scheduling problem. As 
the case study demonstrates, one may find an infinite schedule by extending 
a bounded one such that the extended infinite schedule still satisfies the con- 
straints. This observation inspires future work to investigate mechanisms of 
finding such bounded schedules, hopefully with SMT solvers by extending our 
algorithm. In our earlier work [37], we proposed a similar approach to search 
for periodical schedules in bounded steps. In that approach, CCSL constraints 
are transformed into finite state machine and consequently suffers from the state 
explosion problem. We believe our SMT-based approach can be extended to their 
work while still avoiding state explosion. We leave it to future work. 
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Abstract. We propose €!-logic as a formal foundation for the specifica- 
tion and development of event-based systems with local data states. The 
logic is intended to cover a broad range of abstraction levels from abstract 
requirements specifications up to constructive specifications. Our logic 
uses diamond and box modalities over structured actions adopted from 
dynamic logic. Atomic actions are pairs e//~ where e is an event and w 
a state transition predicate capturing the allowed reactions to the event. 
To write concrete specifications of recursive process structures we inte- 
grate (control) state variables and binders of hybrid logic. The seman- 
tic interpretation relies on event/data transition systems; specification 
refinement is defined by model class inclusion. For the presentation of 
constructive specifications we propose operational event/data specifica- 
tions allowing for familiar, diagrammatic representations by state transi- 
tion graphs. We show that €!-logic is powerful enough to characterise the 
semantics of an operational specification by a single €!-sentence. Thus 
the whole development process can rely on €!-logic and its semantics as 
a common basis. This includes also a variety of implementation construc- 
tors to support, among others, event refinement and parallel composition. 


1 Introduction 


Event-based systems are an important kind of software systems which are open 
to the environment to react to certain events. A crucial characteristics of such 
systems is that not any event can (or should) be expected at any time. Hence the 
control flow of the system is significant and should be modelled by appropriate 
means. On the other hand components administrate data which may change 
upon the occurrence of an event. Thus also the specification of admissible data 
changes caused by events plays a major role. 
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There is quite a lot of literature on modelling and specification of event-based 
systems. Many approaches, often underpinned by graphical notations, provide 
formalisms aiming at being constructive enough to suggest particular designs 
or implementations, like e.g., Event-B [1,7], symbolic transition systems [17], 
and UML behavioural and protocol state machines [12,16]. On the other hand, 
there are logical formalisms to express desired properties of event-based systems. 
Among them are temporal logics integrating state and event-based styles [4], and 
various kinds of modal logics involving data, like first-order dynamic logic [10] 
or the modal p-calculus with data and time [9]. The gap between logics and 
constructive specification is usually filled by checking whether the model of a 
constructive specification satisfies certain logical formulae. 

In this paper we are interested in investigating a logic which is capable to 
express properties of event/data-based systems on various abstraction levels in 
a common formalism. For this purpose we follow ideas of [15], but there data 
states, effects of events on them and constructive operational specifications (see 
below) were not considered. The advantage of an expressive logic is that we can 
split the transition from system requirements to system implementation into a 
series of gradual refinement steps which are more easy to understand, to verify, 
and to adjust when certain aspects of the system are to be changed or when a 
product line of similar products has to be developed. 

To that end we propose €!-logic, a dynamic logic enriched with features of 
hybrid logic. The dynamic part uses diamond and box modalities over structured 
actions. Atomic actions are of the form e//j) with e an event and w a state transi- 
tion predicate specifying the admissible effects of e on the data. Using sequential 
composition, union, and iteration we obtain complex actions that, in connection 
with the modalities, can be used to specify required and forbidden behaviour. In 
particular, if E is a finite set of events, though data is infinite we are able to 
capture all reachable states of the system and to express safety and liveness prop- 
erties. But €!-logic is also powerful enough to specify concrete, recursive process 
structures by integrating state variables and binders from hybrid logic [6] with 
the subtle difference that our state variables are used to denote control states 
only. We show that the dynamic part of the logic is bisimulation invariant while 
the hybrid part, due to the ability to bind names to states, is not. 

An axiomatic specification Sp = (X, Ax) in Et is given by an event/data 
signature X = (£,A), with a set E of events and a set A of attributes to 
model local data states, and a set of €!-sentences Az, called axioms, express- 
ing requirements. For the semantic interpretation we use event/data transition 
systems (edts). Their states are reachable configurations y = (c,w) where c is 
a control state, recording the current state of execution, and w is a local data 
state, i.e., a valuation of the attributes. Transitions between configurations are 
labelled by events. The semantics of a specification Sp is “loose” in the sense that 
it consists of all edts satisfying the axioms of the specification. Such structures 
are called models of Sp. Loose semantics allows us to define a simple refinement 
notion: Sp, refines to Sp, if the model class of Sp, is included in the model class 
of Sp,. We may also say that Sp, is an implementation of Sp,. 
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Our refinement process starts typically with axiomatic specifications whose 
axioms involve only the dynamic part of the logic. Hybrid features will succes- 
sively be added in refinements when specifying more concrete behaviours, like 
loops. Aiming at a concrete design, the use of an axiomatic specification style 
may, however, become cumbersome since we have to state explicitly also all 
negative cases, what the system should not do. For a convenient presentation 
of constructive specifications we propose operational event/data specifications, 
which are a kind of symbolic transition systems equipped again with a model 
class semantics in terms of edts. We will show that €!-logic, by use of the hybrid 
binder, is powerful enough to characterise the semantics of an operational spec- 
ification. Therefore we have not really left €!-logic when refining axiomatic by 
operational specifications. Moreover, since several constructive notations in the 
literature, including (essential parts of) Event-B, symbolic transition systems, 
and UML protocol state machines, can be expressed as operational specifications, 
E!-logic provides a logical umbrella under which event /data-based systems can 
be developed. 

In order to consider more complex refinements we take up an idea of Sannella 
and Tarlecki [18,19] who have proposed the notion of constructor implementa- 
tion. This is a generic notion applicable to specification formalisms based on 
signatures and semantic structures for signatures. As both are available in the 
context of €!-logic, we complement our approach by introducing a couple of 
constructors, among them event refinement and parallel composition. For the 
latter we provide a useful refinement criterion relying on a relationship between 
syntactic and semantic parallel composition. The logic and the use of the imple- 
mentation constructors will be illustrated by a running example. 

Hereafter, in Sect. 2, we introduce syntax and semantics of €!-logic. In Sect. 3, 
we consider axiomatic as well as operational specifications and demonstrate the 
expressiveness of €!-logic. Refinement of both types of specifications using sev- 
eral implementation constructors is considered in Sect. 4. Section 5 provides some 
concluding remarks. Proofs of theorems and facts can be found in [11]. 


2 A Hybrid Dynamic Logic for Event/Data Systems 


We propose the logic €! to specify and reason about event /data-based systems. 
E!-logic is an extension of the hybrid dynamic logic considered in [15] by taking 
into account changing data. Therefore, we first summarise our underlying notions 
used for the treatment of data. We then introduce the syntax and semantics of 
El with its hybrid and dynamic logic features applied to events and data. 


2.1 Data States 


We assume given a universe D of data values. A data signature is given by a set 
A of attributes. An A-data state w is a function w : A — D. We denote by 92(A) 
the set of all A-data states. For any data signature A, we assume given a set 
(A) of state predicates to be interpreted over single A-data states, and a set 
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W(A) of transition predicates to be interpreted over pairs of pre- and post-A-data 
states. The concrete syntax of state and transition predicates is of no particular 
importance for the following. For an attribute a € A, a state predicate may be 
a > 0; and a transition predicate e.g. a’ = a+ 1, where a refers to the value of 
attribute a in the pre-data state and a’ to its value in the post-data state. Still, 
both types of predicates are assumed to contain true and to be closed under 
negation (written +) and disjunction (written V); as usual, we will then also use 
false, A, etc. Furthermore, we assume for each Ag C A a transition predicate 
id4, E #(A) expressing that the values of attributes in Ao are the same in pre- 
and post-A-data states. 

We write w E® y if y € P(A) is satisfied in data state w; and (w1,w2) FR w 
if y € W(A) is satisfied in the pre-data state wı and post-data state w2. In 
particular, (w1,w2) FR ida, if, and only if, w1(@o) = we(ao) for all ag € Ao. 


2.2 €!-Logic 


Definition 1. An event/data signature (ed signature, for short) X = (E, A) 
consists of a finite set of events E and a data signature A. We write E(X) for 
E and A(X) for A. We also write Q(X) for Q(A(Z)), B(X) for &(A(Z)), and 


U(X) for W(A(2)). The class of ed signatures is denoted by Sige. 


Any ed signature X determines a class of semantic structures, the event/data 
transition systems which are reachable transition systems with sets of initial 
states and events as labels on transitions. The states are pairs y = (c,w), called 
configurations, where c is a control state recording the current execution state 
and w is an A(2’)-data state; we write c(7) for c and w(y) for w. 


Definition 2. A S/-event/data transition system (’-edts, for short) M = 
(T,R,Io) over an ed signature X consists of a set of configurations I C 
C x Q(X) for a set of control states C; a family of transition relations 
R = (Re C I x T)cexyy; and a non-empty set of initial configurations 
Io © {co} x Qo for an initial control state co € C and a set of initial data 
states Ro C R(X) such that I is reachable via R, i.e., for all y € I there are 
yo € Io, n > 0, €1,...,€n € E(X), and (Ji, Yi+1) E Re, for all O < i < n with 
Yn = y. We write [(M) for T, C(M) for C, R(M) for R, co(M) for co, Ro(M) 
for Ro, and To(M) for To. The class of X-edts is denoted by Edts® (5). 


Atomic actions are given by expressions of the form e//W with e an event and 
w a state transition predicate. The intuition is that the occurrence of the event 
e causes a state transition in accordance with y, i.e., the pre- and post-data 
states satisfy w, and w specifies the possible effects of e. Following the ideas 
of dynamic logic we also use complex, structured actions formed over atomic 
actions by union, sequential composition and iteration. All kinds of actions over 
an ed signature X are called X-event/data actions (X-ed actions, for short). The 
set A(X) of X-ed actions is defined by the grammar 


A n= eff | Ay + A2 | At; A2 | A* 
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where e € E(X) and y € Y(X). We use the following shorthand notations 
for actions: For a subset F = {e1,...,e,} C E(X), we use the notation F 
to denote the complex action e; //true +... + e,//true and —F to denote the 
action E(X) \ F. For the action E(X) we will write E. For e € E(X), we 
use the notation e to denote the action e//true and —e to denote the action 
E\ {e}. Hence, if E(X) = {e1,...,en} and e; E€ E(X), the action —e, stands for 
e1 //true + ...+e;-1//true + e;41/true + ...+ e,//true. 
The actions A(X) are interpreted over a X-edts M as the family of relations 

(R(M), © P'(M) x P(M))yea(x) defined by 
= iia Jegy = 11,7) E R(M)e | wa) (7) Eam Yh 

R(M)), 40. = R(M)a, U R(M)),, i.e., union of relations, 

R(M)),.., = R(M)),; R(M)),, i.e., sequential composition of relations, 

R(M))« = (R(M)))*, i.e., reflexive-transitive closure of relations. 


To define the event /data formulae of €! we assume given a countably infinite 
set X of control state variables which are used in formulae to denote the control 
part of a configuration. They can be bound by the binder operator |x and 
accessed by the jump operator Qx of hybrid logic. The dynamic part of our logic 
is due to the modalities which can be formed over any ed action over a given ed 
signature. E} thus retains from hybrid logic the use of binders, but omits free 
nominals. Thus sentences of the logic become restricted to express properties of 
configurations reachable from the initial ones. 


Definition 3. The set Frm? (5) of X-ed formulae over an ed signature X is 
given by 


o:=y |x |lx.o | @xr.o | (Ajo | true | =o | o1 V a 


where p E€ B(X), x € X, and A € A(X). We write [A]o for =(A)7@ and we 
use the usual boolean connectives as well as the constant false to denote strue.* 
The set Senf (5) of X-ed sentences consists of all X-ed formulae without free 
variables, where the free variables are defined as usual with |x being the unique 
operator binding variables. 


Given an ed signature X and a X-edts M, the satisfaction of a X-ed formula 
o is inductively defined w.r.t. valuations v : X — C(M), mapping variables to 
control states, and configurations y € ['(M): 

Do 

- Mv, 7 FS 9 iff wy) FRc) v 
~~ M,v, y Ee T iff c(y )= = v(x ); 
— M,v, y =E lx. o iff M, uae hey =E 0; 
— M,v, y Lé Qz.o iff M, v, y =E o for all 7 € F(M) with c(7’) = v(x); 
-— M,v,yE& (A) 0 iff M, v, y =E o for some y’ € I'(M) with (y, y) € R(M)); 


1 We use true and false for predicates and formulae; their meaning will always be clear 
from the context. For boolean values we will use instead the notations tt and ff. 
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- M,v,7 |=Ẹ true always holds; 
- M,v,7 F& 70 iff M,v,7 KE o; 


1 . i i 
= M,v,7 F$ 01 V 02 iff M,v,y ES o1 or M,v,y ES a2. 


If o is a sentence then the valuation is irrelevant. M satisfies a sentence 9 € 
Senf” (5), denoted by M Ẹ o, if M, yo E§ @ for all yo € To(M). 

By borrowing the modalities from dynamic logic [9,10], €! is able to express 
liveness and safety requirements as illustrated in our running ATM example 
below. There we use the fact that we can state properties over all reachable 
states by sentences of the form [E*]y. In particular, deadlock-freedom can be 
expressed by [E*](E)true. The logic Et, however, is also suited to directly express 
process structures and, thus, the implementation of abstract requirements. The 
binder operator is essential for this. For example, we can specify a process which 
switches a boolean value, denoted by the attribute val, from tt to ff and back 
by the following sentence: 


{x .val = tt A (switch//val’ = ff) (switch //val’ = tt)ao. 


2.3 Bisimulation and Invariance 


Bisimulation is a crucial notion in both behavioural systems specification and 
in modal logics. On the specification side, it provides a standard way to identify 
systems with the same behaviour by abstracting the internal specifics of the 
systems; this is also reflected at the logic side, where bisimulation frequently 
relates states that satisfy the same formulae. We explore some properties of €! 
w.r.t. bisimilarity. Let us first introduce the notion of bisimilarity in the context 
of El: 


Definition 4. Let Mı, Mə be X-edts. A relation B C I'(M1) x T(M2) is a 

bisimulation relation between Mı and Mə if for all (41,72) € B the following 

conditions hold: 

(atom) for allp € ®(E), wln) HRe p if ole) Ron o: 

(zig) for allef/ € A(X) and for all yi E€ P'(M,) with (71,71) € R(Mi)eqy, 
there is a y3 E T(M2) such that (72,72) E€ R(M2)egy and (71,74) € B; 

(zag) for all eff E€ A(X) and for all y} E€ T(M2) with (y2,72) € R(Ma)eqy, 
there is ay, E€ T(M1) such that (1,71) E R(Mi)egy and (71, %2) € B. 


Mı and Mə are bisimilar, in symbols Mı ~ Mo, if there exists a bisimulation 
relation B C I'(M;) x T(M2) between Mı and Mz such that 


(init) for any yı € To(Mı), there is a y2 E€ To(M2) such that (91,72) E€ B and 
for any y2 € To(M2), there is a yı E€ To(Mı) such that (71, 72) € B. 


Now we are able to establish a Hennessy-Milner like correspondence for a 
fragment of E}. Let us call hybrid-free sentences ofE! the formulae obtained by 
the grammar 


o ::= ọ | (Ajo | true | ~o | 01 V 02. 
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Theorem 1. Let Mı, Mo be bisimilar X-edts. Then Mı LE o iff M2 =E o for 
all hybrid-free sentences o. 


The converse of Theorem 1 does not hold, in general, and the usual image- 
finiteness assumption has to be imposed: A X-edts M is image-finite if, for all 
y € I'(M) and all e € E(X), the set {7 | (7,7) € R(M)e} is finite. Then: 


Theorem 2. Let Mı, Mz be image-finite X-edts and yı € I'(M1), y2 E€ T(M2) 
such that Mı, %1 E o iff Me, 72 Ee o for all hybrid-free sentences o. Then 
there exists a bisimulation B between Mı and Mz such that (71,72) € B. 


3 Specifications of Event /Data Systems 


3.1 Axiomatic Specifications 


Sentences of €!-logic can be used to specify properties of event/data systems 
and thus to write system specifications in an axiomatic way. 


Definition 5. An axiomatic ed specification Sp = (Sp), Ax(Sp)) in Et 
consists of an ed signature X(Sp) € Sige and a set of axioms Ax(Sp) C 
Senf (37(Sp)). 

The semantics of Sp is given by the pair (2'(Sp), Mod(Sp)) where Mod( Sp) = 
{Me Edts® (5(Sp)) | M ed) Az(Sp)}. The edts in Mod(Sp) are called 
models of Sp and Mod(Sp) is the model class of Sp. 


As a direct consequence of Theorem 1 we have: 


Corollary 1. The model class of an axiomatic ed specification exclusively 
expressed by hybrid-free sentences is closed under bisimulation. 


This result does not hold for sentences with hybrid features. For instance, 
consider the specification Sp = (({e}, {a}),{lz.(efa’ = a)x}): An edts with 
a single control state co and a loop transition Re = {(y0, Yo)} for c(yo) = co 
is a model of Sp. However, this is obviously not the case for its bisimilar edts 
with two control states co and c and the relation RZ = {(7,7), (Y, Yo)} with 


e(y0) = co, c(7) = ¢ and w(4o) = (>). 


Example 1. As a running example we consider an ATM. We start with an 
abstract specification Sp g of fundamental requirements for its interaction 
behaviour based on the set of events Eo = {insertCard, enterPIN, ejectCard, 
cancel}? and on the singleton set of attributes Ag = {chk} where chk is boolean 
valued and records the correctness of an entered PIN. Hence our first ed signa- 
ture is Xo = (Eo, Ao) and Spo = (Xo, Azo) where Azo requires the following 
properties expressed by corresponding axioms (0.1-0.3): 


2 For shortening the presentation we omit further events like withdrawing money, etc. 
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— “Whenever a card has been inserted, a correct PIN can eventually be entered 
and also the transaction can eventually be cancelled.” 


[E*; insertCard]((E*; enterPIN chk’ = tt)true A (E*; cancel) true) (0.1) 


— “Whenever either a correct PIN has been entered or the transaction has been 
cancelled, the card can eventually be ejected.” 


[E*; (enterPIN //chk’ = tt) + cancel](E*; ejectCard)true (0.2) 


— “Whenever an incorrect PIN has been entered three times in a row, the current 
card is not returned.” This means that the card is kept by the ATM which is 
not modelled by an extra event. It may, however, still be possible that another 
card is inserted afterwards. So an ejectCard can only be forbidden as long as 
no next card is inserted. 


[E*; (enterPIN//chk’ = ff)’; (—insertCard)*; ejectCard] false (0.3) 
where A” abbreviates the n-fold sequential composition A;...; AÀ. 


The semantics of an axiomatic ed specification is loose allowing usually for 
many different realisations. A refinement step is therefore understood as a restric- 
tion of the model class of an abstract specification. Following the terminology 
of Sannella and Tarlecki [18,19], we call a specification refining another one 
an implementation. Formally, a specification Sp’ is a simple implementation of 
a specification Sp over the same signature, in symbols Sp ~~ Sp’, whenever 
Mod(Sp) D Mod(Sp’). Transitivity of the inclusion relation ensures gradual 
step-by-step development by a series of refinements. 


Example 2. We provide a refinement Spy) ~ Sp, where Sp, = (Xo, Axı) has the 
same signature as Sp, and Az, are the sentences (1.1-1.4) below; the last two 
use binders to specify a loop. As is easily seen, all models of Sp, must satisfy 
the axioms of Spo. 


— “At the beginning a card can be inserted with the effect that chk is set to ff; 
nothing else is possible at the beginning.” 


(insertCard //chk’ = ff)true ^ (1.1) 
[insertCard //—(chk’ = ff)]|false A [—insertCard]false 


— “Whenever a card has been inserted, a PIN can be entered (directly after- 
wards) and also the transaction can be cancelled; but nothing else.” 


[E*; insertCard]((enterPIN)true A (cancel)true A (1.2) 
[—{enterPIN, cancel}]false) 
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— “Whenever either a correct PIN has been entered or the transaction has been 
cancelled, the card can eventually be ejected and the ATM starts from the 
beginning.” 


[xo . [E*; (enterPIN //chk’ = tt) + cancel] (E*; ejectCard)xo (1.3) 


— “Whenever an incorrect PIN has been entered three times in a row the ATM 
starts from the beginning.” Hence the current card is kept. 


[xo . [E*; (enterPIN chk’ = ff)*Jao (1.4) 


3.2 Operational Specifications 


Operational event/data specifications are introduced as a means to specify in a 
more constructive style the properties of event /data systems. They are not appro- 
priate for writing abstract requirements for which axiomatic specifications are 
recommended. Though €!-logic is able to specify concrete models, as discussed 
in Sect.2, the use of operational specifications allows a graphic representation 
close to familiar formalisms in the literature, like UML protocol state machines, 
cf. [12,16]. As will be shown in Sect. 3.3, finite operational specifications can be 
characterised by a sentence in €!-logic. Therefore, €!-logic is still the common 
basis of our development approach. Transitions in an operational specification 
are tuples (c, p, e, Y, c’) with c a source control state, p a precondition, e an event, 
w a state transition predicate specifying the possible effects of the event e, and 
c' a target control state. In the semantic models an event must be enabled when- 
ever the respective source data state satisfies the precondition. Thus isolating 
preconditions has a semantic consequence that is not expressible by transition 
predicates only. The effect of the event must respect Y; no other transitions are 
allowed. 


Definition 6. An operational ed specification O = (X, C,T, (co, po)) is given 
by an ed signature 3’, a set of control states C, a transition relation specification 
T CCx@(S) x E(Z) x (2) x C, an initial control state co € C, and an initial 
state predicate yp E€ P(X), such that C is syntactically reachable, i.e., for every 
c E€ C\{co} there are (co, 91, €1, U1, C1), +--+; (Cn—1; Pns en; Uns Cn) E T withn > 0 
such that cn = c. We write (O) for X, etc. 

A X-edts M is a model of O if C(M) = C up to a bijective renaming, 
co(M) = co, Ro(M) C {w |w E2) po}, and if the following conditions hold: 


- for all (c,y,e,¥,c) ET andw E€ RQ(A(X)) with w As) yp, there is a ((c,w), 
(c’,w’)) € R(M)e with (w,w’) As) p; 
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true 


tris < 2 enterPIN// 


insertCard // chk’ = ff Atrls’ = tris + 1 


chk’ = ff A tris’ = 0 


tris = 2 — enterPIN/ 
chk’ = ff Atris’ = tris +1 


cancel // 
chk’ = ff Atrls’ = trls 


ejectCard // 
chk’ = chk A trls’ = trls 


tris < 2 + enterPIN/ 
chk’ = tt Atrls’ = tris + 1 


Fig. 1. Operational ed specification ATM 


— for all ((c,w), (c’,w’)) E R(M), there is a (c, p,e, Y, c) E T with w EZ) p 


and (w, w") EZ) yp. 


The class of all models of O is denoted by Mod(O). The semantics of O is given 
by the pair (X(O), Mod(O)) where X(O) = X. 


Example 3. We construct an operational ed specification, called ATM, for the 
ATM example. The signature of ATM extends the one of Sp, (and Spo) by an 
additional integer-valued attribute trls which counts the number of attempts to 
enter a correct PIN (with the same card). ATM is graphically presented in Fig. 1. 
The initial control state is Card, and the initial state predicate is true. Precondi- 
tions are written before the symbol —. If no precondition is explicitly indicated 
it is assumed to be true. Due to the extended signature, ATM is not a simple 
implementation of Sp,, and we will only formally justify the implementation 
relationship in Example 5. 


Operational specifications can be composed by a syntactic parallel composi- 
tion operator which synchronises shared events. Two ed signatures X; and 27 
are composable if A(X1) N A(X) = Ø. Their parallel composition is given by 
X1 Q X = (E(X1) U E( £2), A(21) U A(X2)). 


Definition 7. Let Xı and Xs be composable ed signatures and let O, and O2 
be operational ed specifications with X(O1) = Xı and X(O2) = Xə. The parallel 
composition of O1 and Oz is given by the operational ed specification Oj || O2 = 
(Xi & X2, C, T, (co, po)) with co = (co(O1), co(O2)), Yo = o(O1) A po(O2), and 
C and T are inductively defined by co € C and 
- for ey E€ E(X1) \ E(X2), c1,¢4 € C(O1), and c2 € C(O2), if (c1,c2) E€ C and 
(c1, 91, €1, Y1, c1) E T(O1), then (c4, c2) € C and ((c1, c2), p91, €1, Y1 Aidas,), 
(ci,c2) ET; 
- for eg € E(X2) \ E(21), c2,c3 € C(O2), and cı € C(O1), if (c1,c2) E C and 
(co, Pa, €2, Y2, €23) E T(O2), then (c1,c3) E C and ((c1, c2), Yo, €2, Y2 Nid 4(y,), 
(c1,¢5)) € T; 


A Hybrid Dynamic Logic for Event/Data-Based Systems 89 


- fore € E(5\)N E(X2), c1,c¢4 E€ C(O1), and co, ch E C(O2), if (c1, c2) 
(c1, 91, , 1, C4) € T(0O1), and (c2, P2, €, Y2, Cy) € T(Oz), then (ch, cy) 
and ((c1, c2), P1 A p2,€, Pı A Wo, (ci, ¢)) € T.3 


C, 


= 
EC 


3.3. Expressiveness of £l-Logic 


We show that the semantics of an operational ed specification O with finitely 
many control states can be characterised by a single € !-gentence go, i.e., an edts 
M is a model of O iff M Ea go. Using Algorithm 1, such a characterising 
sentence is 


00 = leo. po A sen(co, Imo(co), C(O), {co}) , 


where co = co(O) and yo = Yo(O). Algorithm 1 closely follows the procedure 
in [15] for characterising a finite structure by a sentence of D!-logic. A call sen(c, 
I,V,B) performs a recursive breadth-first traversal through O starting from c, 
where I holds the unprocessed quadruples (p,e, %,c') of transitions outgoing 
from c, V the remaining states to visit, and B the set of already bound states. 
The function first requires the existence of each outgoing transition of J, provided 
its precondition holds, in the resulting formula, binding any newly reached state. 
Then it requires that no other transitions with source state c exist using calls 
to fin. Having visited all states in V, it finally requires all states in C(O) to be 
pairwise different. 


Algorithm 1. Constructing a sentence from an operational ed specification 
Require: O = finite operational ed specification 

Imo(c) = {(¢,e,v,c’) | (c, p,e, Y,c') E€ T(O)} for ce C(O) 

Imo(c,e) = {(¥, Y, c) | (c, p,e, Y,c') € T(O)} for c € C(O), e € E(X(O)) 


1 function sen(c,J,V,B) > c: state, I: image to visit, V: states to visit, B: bound states 
2 if I # 0 then 

3 (p,e, v, c’) — choose I 

4 if c’ € B then 

5 return Qc. yp > (e//w)(c’ A sen(c, I \ {(y, e, Y, c')}, V, B)) 

6 else 

7 return Qc. y — (e//w)({c’ .sen(c, I \ {(y, e,¥,c')}, V, BU {c'})) 


8 V<-V\{c} 
9 if V Æ Ø then 
10 c’ — choose BAV 
11 return fin(c) A sen(c’, Imo(c’), V, B) 
12 return fin(c) A Ney EC(0),c2€C(0)\{e1} Qc] . c2 
13 function fin(c) 
14 return Qc. Neceso) Apcimo (ce) 

[ef/( Ney enep? Aw)) A 

(Vey, eemote PAVI ewenep ©) 


3 Note that joint moves with e cannot become inconsistent due to composability of ed 
signatures. 
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It is fin(c) where this algorithm mainly deviates from [15]: To ensure that 
no other transitions from c exist than those specified in O, fin(c) produces the 
requirement that at state c, for every event e and for every subset P of the 
transitions outgoing from c, whenever an e-transition can be done with the com- 
bined effect of P but not adhering to any of the effects of the currently not 
selected transitions, the e-transition must have one of the states as its target 
that are target states of P. The rather complicated formulation is due to possi- 
bly overlapping preconditions where for a single event e the preconditions of two 
different transitions may be satisfied simultaneously. For a state c, where all out- 
going transitions for the same event have disjoint preconditions, the €!-formula 
returned by fin(c) is equivalent to 


Qe. Nece) Nesejetmolea lef p A yle A 
lef =V oy, e mole lP ^ ~)) false. 


Example 4. We show the first few steps of representing the operational ed spec- 
ification ATM of Fig.1 as an € l-sentence 047m. This top-level sentence is 


| Card . true A sen( Card, {(true, insertCard, chk’ = ff A trls’ = 0, PIN)}, 
{Card, PIN, Return}, {Card}). 


The first call of sen(Card,...) explores the single outgoing transition from Card 
to PIN, adds PIN to the bound states, and hence expands to 


@Card . true — (insertCard //chk’ = ff A tris’ = 0)| PIN. 
sen(Card,0,{ Card, PIN, Return}, { Card, PIN}). 


Now all outgoing transitions from Card have been explored and the next call of 
sen( Card, Q, ...) removes Card from the set of states to be visited, resulting in 


fin( Card) A sen( PIN, {(trls < 2, enterPIN, ...),(trls = 2, enterPIN,...), 
(tris < 2, enterPIN, . . .), (true, cancel,...)}, 
{PIN, Return}, {Card, PIN}). 


As there is only a single outgoing transition from Card, the special case of disjoint 
preconditions applies for the finalisation call, and fin(Card) results in 


@ Card . {insertCard//chk’ = ff A tris’ = 0|PIN A 
[insertCard //chk’ = tt V trls Æ O]false A 
fenterPIN //true]false A [cancel //true]false A [ejectCard //true]false. 


4 Constructor Implementations 


The implementation notion defined in Sect. 3.1 is too simple for many practical 
applications. It requires the same signature for specification and implementation 
and does not support the process of constructing an implementation. Therefore, 
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Sannella and Tarlecki [18,19] have proposed the notion of constructor implemen- 
tation which is a generic notion applicable to specification formalisms which are 
based on signatures and semantic structures for signatures. We will reuse the 
ideas in the context of €!-logic. 

The notion of constructor is the basis: for signatures 171,..., Xn, X € Sige, 
a constructor k from (X1,..., Xn) to X is a (total) function x : Edts® (51) x 


pia X Edts® (3n) > Edts® (5). Given a constructor « from (X1,..., Xn) to X 
and a set of constructors K; from (X},..., a) to ©}, 1 < i < n, the constructor 
(kis: Kn); k from (S},..., Of,...,£1,..., XE») to X is obtained by the usual 
composition of functions. The following definitions apply to both axiomatic and 
operational ed specifications since the semantics of both is given in terms of ed 
signatures and model classes of edts. In particular, the implementation notion 
allows to implement axiomatic specifications by operational specifications. 


Definition 8. Given specifications Sp, Sp,,...,Sp,, and a constructor k from 
(7(Sp,),..., 8(Sp,,)) to (Sp), the tuple (Sp,,...,Sp,,) is a constructor imple- 
mentation via x of Sp, in symbols Sp ~>; (Sp,,..-.,Sp,), if for all Mi; € 
Mod(Sp;) we have K(M,,..., Mn) E€ Mod(Sp). The implementation involves a 
decomposition if n > 1. 


The notion of simple implementation in Sect. 3.1 is captured by choosing the 
identity. We now introduce a set of more advanced constructors in the context of 
ed signatures and edts. Let us first consider two central notions for constructors: 
signature morphisms and reducts. For data signatures A, A’ a data signature 
morphism o : A — A’ is a function from A to A’. The o-reduct of an A’-data 
state w : A’ — D is given by the A-data state w'ļo : A — D defined by 
(w’|o)(a) = w'(o(a)) for every a € A. If A C A’, the injection of A into A’ is a 
particular data signature morphism and we denote the reduct of an A’-data state 
w to A by w’[A. If A = Ay U Ag is the disjoint union of A; and Ag and w; are 
A;-data states for i € {1,2} then wı +w2 denotes the unique A-data state w with 
wA; = wi for i € {1,2}. The o-reduct y|o of a configuration 7 = (c,w’) is given 
by (c,w’|o), and is lifted to a set of configurations I’ by I’|o = {y'|o | y € I’}. 


Definition 9. An ed signature morphism o = (og,0,4) : X > X' is given by 
a function og : E(X) > E(X') and a data signature morphism oa : A(X) > 
A(X’). We abbreviate both og and ao, by o. 


Definition 10. Leto: X — &” be an ed signature morphism and M’ a X'-edts. 
The o-reduct of M’ is the X-edts M'|o = (1, R, Io) such that Ip = Io(M')|o, 
and I and R = (Re)cex(y) are inductively defined by To C I and for alle € 
E(X), Y, y E T(M'): if ylo ET and (7',7") € RUM’) o(e), then y"|a E€ T and 
(flo, 4'10) € Re. 


Definition 11. Leto: X — X' be an ed signature morphism. The reduct con- 
structor Ko from X' to X maps any M' € Edts® (5') to its reduct ko(M') = 
M'\o. Whenever o4 and opg are bijective functions, Ko is a relabelling construc- 
tor. If og and oy, are injective, Ko is a restriction constructor. 
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Example 5. The operational specification ATM is a constructor implementation 
of Sp, via the restriction constructor «K, determined by the inclusion signature 
morphism 1: 3(Sp,) > (ATM), i.e., Sp; ~x, ATM. 


A further refinement technique for reactive systems (see, e.g., [8]), is the 
implementation of simple events by complex events, like their sequential compo- 
sition. To formalise this as a constructor we use composite events O(E) over a 
given set of events E, given by the grammar 0 ::=e | 0 + 0 | 0;0 | 0* with e € E. 
They are interpreted over an (E, A)-edts M by R(M)o +0, = R(M)o, UR(M)o,, 
R(M)o,.0. = R(M)o,;R(M)o,, and R(M)o» = (R(M)o)*. Then we can intro- 
duce the intended constructor by means of reducts over signature morphisms 
mapping atomic to composite events: 


Definition 12. Let X, X' be ed signatures, D’ a finite subset of O(E(X")), A’ = 
(D’, A(Z")), anda: X — A’ an ed signature morphism. The event refinement 
constructor Ka from A’ to X maps any M' € Edts® (A’) to its reduct M'|a € 
Edts® (3). 


Finally, we consider a semantic, synchronous parallel composition construc- 
tor that allows for decomposition of implementations into components which 
synchronise on shared events. Given two composable signatures X1 and X3, the 
parallel composition yı ® y2 of two configurations yı = (c1,w1), Y2 = (C2, w2) 
with wı E€ 2(A(Z1)), w2 E€ R(A(X2)) is given by ((c1, c2), w1 + w2), and lifted to 
two sets of configurations T} and Is by Di 8 Dh = {71 8%] y ED, Eh}. 


Definition 13. Let X1, X2 be composable ed signatures. The parallel compo- 

sition constructor Ko from (X1, X2) to ©, @ Xə maps any M, € Edts® (51), 

Mz € Edts®' (52) to Mı ® My = (T,R, Iù) € Edts® (5, @ X2), where 

Io = Ip(M1) 9 To(M2), and I and R = (Re) e(s,)ur(s2) are inductively defined 

by Io CT and 

- for alle, € E(X1) \E(X2), 11.4%, E T(M1), and y2 E T(M2), if IRET 
and (41,71) E R(My)e,, then 7, ® %2 E T and (71 ® V2, 71 8 V2) € Re; 

= for all e2 € E(Xə2) \ E(5)), 2, V E Tr(M2), and WE I'(M,), of V1 QVE JS 
and (72,72) E€ R(Mə)ez, then q1 8 Y3 E I and (71 B V2, V1 ® Ya) E Rez; 

= for alle € E(X) N E(Xə2), RGA E I'(M,), and 2, V < Tr(M2), if 1 & 
eT, (v1) V1) £ R(Mi)e,, and (2,72) € R(M2)es, then yi BDA € I and 
(V1 8 2,71 D V) € Re. 


An obvious question is how the semantic parallel composition constructor is 
related to the syntactic parallel composition of operational ed specifications. 


Proposition 1. Let O1, O2 be operational ed specifications with composable sig- 
natures. Then Mod(O,)®Mod(O2) C Mod(O, || O2), where Mod(O,)®Mod(O2) 
denotes Ke (Mod(QO,), Mod(Oz)). 
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The converse Mod(Q, || O2) C Mod(O;) ® Mod(O2) does not hold: Consider 
the ed signature X = (E, A) with E = {e}, A = 9, and the operational ed 
specifications O; = (X, Ci, Ti, (cio, %i,9)) for i € {1,2} with C1 = {ao}, Ti = 
{(c1,0, true, e, false, c1,0)}, Yio = true; and C2 = {ceo}, To = Ø, yoo = true. 
Then Mod(0O;) = Ø, but Mod(O, || O2) = {M} with M showing just the initial 
configuration. 

The next theorem shows the usefulness of the syntactic parallel composi- 
tion operator for proving implementation correctness when a (semantic) parallel 
composition constructor is involved. The theorem is a direct consequence of 
Proposition 1 and Definition 8. 


Theorem 3. Let Sp be an (axiomatic or operational) ed specification, O1, O2 
operational ed specifications with composable signatures, and k an implemen- 
tation constructor from X(01) ® X(O2) to X(Sp): If Sp ~p O1 || O2, then 
Sp Ke iK (O1, O2). 


Example 6. We finish the refinement chain for the ATM specifications by apply- 
ing a decomposition into two parallel components. The operational specifica- 
tion ATM of Example3 (and Example 5) describes the interface behaviour 
of an ATM interacting with a user. For a concrete realisation, however, an 
ATM will also interact internally with other components, like, e.g., a clear- 
ing company which supports the ATM for verifying PINs. Our last refinement 
step hence realises the ATM specification by two parallel components, repre- 
sented by the operational specification ATM” in Fig. 2a and the operational 
specification CC of a clearing company in Fig.2b. Both communicate (via 
shared events) when an ATM sends a verification request, modelled by the 
event verifyPIN, to the clearing company. The clearing company may answer 
with correctPIN or wrongPIN and then the ATM continues following its speci- 
fication. For the implementation construction we use the parallel composition 
constructor Kg from (X(ATM'), X(CC)) to X(ATM’') & X(CC). The signa- 
ture of CC consists of the events shown on the transitions in Fig. 2b. More- 
over, there is one integer-valued attribute cnt counting the number of verifica- 
tion tasks performed. The signature of ATM’ extends X(ATM) by the events 
verifyPIN, correctPIN and wrongPIN. To fit the signature and the behaviour 
of the parallel composition of ATM’ and CC to the specification ATM we 
must therefore compose kg with an event refinement constructor Ka such that 
a(enterPIN) = (enterPIN; verifyPIN; (correctPIN+-wrongPIN)); for the other events 
a is the identity and for the attributes the inclusion. The idea is therefore that 
the refinement looks like ATM ~>,.;%. (ATM', CC). To prove this refinement 
relation we rely on the syntactic parallel composition ATM’ || CC shown in 
Fig. 2c, and on Theorem 3. It is easy to see that ATM ~»,, ATM’ || CC. In 
fact, all transitions for event enterPIN in Fig. 1 are split into several transitions 
in Fig. 2c according to the event refinement defined by a. For instance, the loop 
transition from PIN to PIN with precondition tris < 2 in Fig.1 is split into 
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insertCard // 
chk’ = ff Atrls’ = 0 


PIN 


true 
cancel // 
ejectCard // chk’ = ffA 
chk’ = chk A tris’ = trls 


tris’ = trls 


trls = 2 [ Return | tris < 24 
irae | enterPIN// 
chk’ = chk’ = chk A 
a tris < 2 > k= 
tris’ = H, #1 correctPIN/ tris’ = tris 
chk’ = tt A tls < 2 > 


wrongPIN// 
chk’ = 
tris’ = trls + 1 


PINEntered 
tris < 2 — verifyPIN// 


chk’ = chk A tris’ = tris 


tris’ = trls + 1 


[verifying 


(a) Operational ed specification ATM’ 


verifyPIN /cnt’ = cnt 


cae o [ate | correctPIN //cnt’ = cnt + 1 Busy 


wrongPIN /ent’ = cnt + 1 


(b) Operational specification CC of a clearing company 


insertCard // 
chk’ = ff Atrls’ = 0 A cnt = cnt’ 


PIN, Idle 


cnt = 0 Card, Idle 
cancel // 
ejectCard// chk’ = ffA 
chk’ = chk A tris’ = trls 
tris’ = tris A ent’ = cnt 


cnt’ = cnt 


trls = 2 => tris < 2 > 
wrongPIN/ Return, Idle enterPIN // 
chk’ = ff A chk’ = chk A 
tris’ = trls + 1 A^ tris’ = tris A 
ent! = cnt +1 tris < 2 => ent’ = cnt 

correctPIN // trls < 2> 

chk’ = tt A wrongPIN / 

tris’ = trls + 1 ^ chk’ = ff A 

LZ = 
ent = ene tris’ = trls+ 1A 


ent’ = cnt + 1 


PIN Entered, Idle 
tris < 2 > verifyPIN / 


chk’ = chk A trls’ = trls A cnt = cnt’ 


Verifying, Busy 


(c) Syntactic parallel composition ATM” || CC 
Fig. 2. Operational ed specifications ATM’, CC and their parallel composition 
the cycle from (PIN, Idle) via (PINEntered, Idle) and (Verifying, Busy) back to 


(PIN, Idle) in Fig. 2c. Thus, we have ATM ~x, ATM’ || CC and can apply 
Theorem 3 such that we get ATM x9: (ATM', CC). 
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5 Conclusions 


We have presented a novel logic, called €!-logic, for the rigorous formal devel- 
opment of event-based systems incorporating changing data states. To the best 
of our knowledge, no other logic supports the full development process for this 
kind of systems ranging from abstract requirements specifications, expressible 
by the dynamic logic features, to the concrete specification of implementations, 
expressible by the hybrid part of the logic. 

The temporal logic of actions (TLA [13]) supports also stepwise refinement 
where state transition predicates are considered as actions. In contrast to TLA 
we model also the events which cause data state transitions. For writing con- 
crete specifications we have proposed an operational specification format captur- 
ing (at least parts of) similar formalisms, like Event-B [1], symbolic transition 
systems [17], and UML protocol state machines [16]. A significant difference to 
Event-B machines is that we distinguish between control and data states, the 
former being encoded as data in Event-B. On the other hand, Event-B sup- 
ports parameters of events which could be integrated in our logic as well. An 
institution-based semantics of Event-B has been proposed in [7] which coincides 
with our semantics of operational specifications for the special case of determin- 
istic state transition predicates. Similarly, our semantics of operational specifi- 
cations coincides with the unfolding of symbolic transition systems in [17] if we 
instantiate our generic data domain with algebraic specifications of data types 
(and consider again only deterministic state transition predicates). The syntax 
of UML protocol state machines is about the same as the one of operational 
event/data specifications. As a consequence, all of the aforementioned concrete 
specification formalisms (and several others) would be appropriate candidates 
for integration into a development process based on €!-logic. 

There remain several interesting tasks for future research. First, our logic is 
not yet equipped with a proof system for deriving consequences of specifications. 
This would also support the proof of refinement steps which is currently achieved 
by purely semantic reasoning. A proof system for €!-logic must cover dynamic 
and hybrid logic parts at the same time, like the proof system in [15], which, 
however, does not consider data states, and the recent calculus of [5], which 
extends differential dynamic logic but does not deal with events and reactions to 
events. Both proof systems could be appropriate candidates for incorporating the 
features of €!-logic. Another issue concerns the separation of events into input 
and output as in I/O-automata [14]. Then also communication compatibility 
(see [2] for interface automata without data and [3] for interface theories with 
data) would become relevant when applying a parallel composition constructor. 
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Abstract. We present Pyro, a framework for enabling domain-specific 
modeling via the internet. Provided with an adequate metamodel spec- 
ification, Pyro turns your browser into a collaborative, domain-specific, 
graphical development environment with features reminiscent of desktop 
IDEs for textual programming languages. The required metamodeling 
is supported in a high-level, simplicity-driven fashion, and the entire 
ready-to-run browser-based domain-specific development environment is 
generated fully automatically. We will illustrate the steps of this devel- 
opment along the realization of a graphical IDE for the Architecture 
Analysis and Design Language (AADL). 


1 Introduction 


Domain-specific languages (DSLs) aim at closing the gap between domain knowl- 
edge and software development by explicitly supporting the required domain 
concepts. Graphical domain-specific languages have turned out to be particu- 
larly suitable for domain experts without any programming background. The 
bottleneck in practice is the enormous effort to develop the required domain- 
specific graphical modeling tools. The CINCO SCCE Meta Tooling Suite [26] has 
been designed to overcome this bottleneck by providing a holistic, simplicity- 
driven [22] approach for the creation of such domain-specific graphical modeling 
tools. A key feature of CINCO is that it generates the entire graphical modeling 
environment (referred to as ‘CINCO Products’ in the remainder of the paper) 
from high-level specifications of the defined model structures and functionali- 
ties. The (translational) semantics of the specified modeling language is defined 
in terms of code generation, model transformation, evaluation, and/or interpre- 
tation [20]. CINCO Products are Eclipse-based, graphical modeling tools that are 
realized via a number of Eclipse plug-ins [13]. Thus, setting up a CINCO Prod- 
uct involves some technical aspects that are beyond the competence of typical 
domain experts, and it becomes even more tedious when one wants to enable a 
cooperative development. 

In this paper, we present Pyro, a tool that enables one to generate CINCO 
Products for collaborative modeling that run in a web browser. Conceptually, 
Pyro borrows from modern online editors for collaborative work, like Google 
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CINCO Product 


Specification Models 


; Eclipse based CINCO Product i Pyro Modeling Environment 


Generated Parts 


Generated Parts 


Pyro runtime 


CINCO Generator Pyro Generator 


Fig. 1. Cinco generation architecture. 


Docs, Microsoft Office 365, or solutions like ShareLaTeX/Overleaf that even 
free one from maintaining a corresponding build and runtime environment. 

Key to the realization of Pyro is that CINCO follows a fully generative app- 
roach on the meta level, which allows one to modularly ‘retarget’ the CINCO 
Product Generation for the web (cf. Fig.1). Technically, Pyro web modeling 
environments utilize DyWA [27] (Dynamic Web Application) for data modeling, 
empowering prototype-driven application development. 

In order to achieve this retargeting and to enable collaborative work, Pyro 
needs to, in particular, compensate for all the required functionality provided 
by the Eclipse platform, like the EMF framework with GMF or Graphiti for 
graphical editors. Altogether, this poses the following three key challenges: 


— Developing an adequate web solution for the metamodel-based model han- 
dling (API, persistence, event system, etc.) that in the Eclipse world is pro- 
vided by the EMF framework [33] (see Architecture Backend, Sect. 3.1). 

— Developing a frontend on top of these model structures that feels like a modern 
integrated development environment with a graphical editor for the models, 
which in the Eclipse world is provided by the Rich Client Platform (RCP) [24] 
and the Graphiti editor framework [2] (see Architecture Frontend, Sect. 3.2). 

— Enabling real-time live collaborative working on models, which is not foreseen 
in an offline client like Eclipse (see Collaborative Editing, Sect. 4). 


In the course of this tool paper, Pyro is illustrated along the development of a 
graphical modeling environment for the Architecture Analysis and Design Lan- 
guage (AADL), an SAE standard for modeling the architecture of embedded 
real-time systems [29]. CINCO was used to develop a graphical AADL modeling 
tool supporting a subset of AADL’s features tailored to be used in teaching [28], 
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Fig. 2. Pyro web-based modeling environment for the AADL language. 


where it replaces the graphical editor of the OSATE tool [8] (AADL’s refer- 
ence implementation). Furthermore, a dedicated code generator was developed 
to support verification with behavior specified with the BLESS language [17]. 
Another example for Pyro realizing a DSL for point and click adventures can be 
found in [21]. 

Figure 2 shows the web-based graphical AADL editor in Pyro!. We will use 
this editor in the remainder of this paper to illustrate CINCO’s and Pyro’s core 
ideas and concepts. The user interface is designed after commonly known con- 
cepts from integrated development environments, like Eclipse or IntelliJ. The 
main area in the center is covered by the modeling canvas showing the currently 
edited model. On the right, there is the palette showing the available types of 
modeling elements. They can be placed onto the canvas just by drag&drop. The 
attributes of the currently selected element in the editor can be set via the prop- 
erties view at the bottom. The validation view (bottom right corner) constantly 
checks for the syntax and static semantics of the model in the canvas and pro- 
vides appropriate error or warning messages. Finally, a project explorer and a 
menu bar complete the IDE-like appearance. 


The remainder of the paper is organized as follows: While Sect. 2 briefly describes 
the use of CINCO’s specification languages to define a sophisticated graphical 


1 The editor is available for experimentation on the Pyro website: https://pyro.scce.info. 
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modeling language, the generation to a web-based environment and the resulting 
architecture is explained in Sect.3. The mechanisms and techniques used to 
enable simultaneous collaboration are explained in Sect. 4. The paper closes with 
a summary, related work, and an outlook of the future development in Sect. 5. 


2 DSL Development with Cinco 


CINCO is a language workbench [11] for the simplicity-driven development of 
graphical modeling environments that are domain-specific [12], support full code 
generation [10,15], and easily integrate existing solutions in the form of ser- 
vices [23]. As CINCO is itself a meta-level application of these principles [25], it 
is specialized to the domain of ‘graph-based graphical modeling tools’ and fully 
generates such tools from meta-level descriptions (models) — the key enabling 
factor for the whole Pyro approach. Primarily relevant in this regard are two 
CINCO metamodeling languages:” 


1. The Meta Graph Language (MGL) allows for the definition of the abstract 
syntax of the developed language, i.e., which types of language elements exist 
and how they can be related. In the context of AADL, this means, for instance, 
that a system model consists of devices, processes and threads, and that all 
of them have ports (of different types) that can be connected with data/in- 
formation flow edges. 

2. The Meta Style Language (MSL) is used to specify the concrete graphical 
syntax of those MGL-defined concepts by means of simple hierarchical shapes 
and their appearance (such as color, line type/width, etc.). As can be seen in 
Fig. 2, for instance, devices are depicted by a black thick line rectangle, while 
threads appear as a grey dashed line parallelogram. 


With these meta-level specification files, the Cinco Product Generator 
(which is part of CINCO) generates plug-ins for the Eclipse Rich Client Plat- 
form (RCP) that realize the editor based on the Eclipse Modeling Framework 
(EMF) and the Graphiti graphical editor framework. Further additions to the 
editor, which are not covered by these two specification files, can be injected in 
an aspect-oriented fashion [16]: CINCO provides a so-called mechanism of hooks 
that are triggered on the occurrence of certain events, for instance, when a node 
is created, moved, or deleted. Hooks are inserted into the MGL file with anno- 
tations on the model elements defined therein. The effect of a hook can either 
be modeled in a transformation language [20] or directly be written as Java 
code using the generated model API. In the context of the AADL editor, e.g., a 
postMoveHook is used to move a port to the nearest border within its container 
after it has been moved by the user. This results in a very natural ‘snapping to 
the border’ effect during modeling. 

As CINCO follows a fully generative approach, the very same specification 
files are utilized by Pyro to generate a web-based modeling editor that runs in 


? For a more elaborate introduction on how to define a graphical editor with CINCO, 
as well as other case studies and exemplary modeling languages, please refer to [26]. 
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the browser (cf. Fig. 1). Of course, in this context, the running platform won’t be 
based on Eclipse anymore, but based on common web frameworks like Angular 
for the frontend and Java EE for the backend. The aspects of a CINCO Product 
included in a service-oriented fashion via native components written in Java (for 
instance a code generator or editor-assisting features like the hooks discussed 
above) can thus directly be run also in the backend of the Pyro editor. 

In the following, we will focus on two particularly important aspects of Pyro: 
After discussing the frontend/backend architecture of the generated Pyro mod- 
eling environments in Sect. 3, we will take a deeper look at the communication 
pattern between the involved components that facilitates synchronous collabo- 
rative modeling (cf. Sect. 4). 


3 Architecture 


In contrast to developing an Eclipse-based modeling environment, for the real- 
ization of a web-based solution one nearly has to start from scratch. Eclipse 
itself is built on a huge amount of plug-ins, developed over the past seventeen 
years. In particular, the Eclipse Modeling Project provides many frameworks for 
developing modeling languages based on metamodels and bundling them into 
a rich IDE. In the context of the web, development of integrated environments 
has just started, so that only a few best practices, plug-ins, and frameworks are 
available. This means, even fundamental features often have to be implemented 
to enable basic functionalities. The main difference between local desktop IDEs 
and a web-based environment like Pyro is the opportunity to provide distributed 
access to a centralized instance by multiple users at the same time. This results 
in new challenges and requirements regarding the synchronization between mul- 
tiple users and conflict resolution for oppositional modifications. 

Thus, the Pyro architecture must be built in a way that adequately substi- 
tutes what Eclipse already provides in the desktop application context, but also 
be prepared for the distributed setting with multiple users — in particular for 
supporting live collaborative editing on the same models. In this section, the 
generation of Pyro web-based modeling environments is described in a way that 
shows how the needed information is collected from CINCO’s high-level specifi- 
cation metamodels and where the generated code is placed and distributed in 
the overall context to build the Pyro architecture. 

The previously introduced specification of the AADL modeling language con- 
stitutes the source for the tool generation step. After the Pyro generator is trig- 
gered, all MGL and MSL files for a CINCO-based modeling tool are collected to 
gather the required information. At this point, all modeling languages, including 
their available node and edge types, are visible for the generator. 

In the next step, a template of the modeling environment web application 
is created. The gray parts with dotted borders in Fig.3 show the static ele- 
ments independent of the given language specification, whereas the blue parts 
with solid borders are specifically generated from this specification. The tem- 
plate consists of a DyWA-based backend, extended by a specific Domain Layer 
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Angular Dart 


Fig. 3. Overall architecture of the generated web-based modeling environment. 


for communication. On the client side, some general parts provide Registration, 
Login, and Project Management, but the main component is the specific Editor 
generated to handle instances of the graphical modeling language. The underly- 
ing single-page web application framework Angular Dart [1] is utilized to enable 
the required features of a rich internet application, like versatile user interaction 
and asynchronous communication. 

Essentially, in the backend, the challenge of providing the metamodel-based 
model handling (persistence, API, event handling, etc.) is solved, which in the 
CINCO desktop client world is provided by the EMF framework. The frontend, 
on the other hand, realizes the rich IDE-like frame application with the graph- 
ical editor for the models. In the following, these two parts are explained in 
more detail to show how the different layers are connected and which parts are 
generated to establish the entire integrated environment. 


3.1 Backend 


The backend of a modeling environment generated using Pyro consists of two 
main layers: One is responsible for the centralized persistence of model instances, 
the other for receiving and distributing modifications. The lowest level of the 
web application is the database to store information in a centralized fashion. 
This layer handles the representation of predefined metamodels for the given 
domain-specific languages. Pyro modeling environments utilize the DyWA as 
an abstraction layer of a database to store types and objects in a dynamic 
and loosely coupled fashion [27]. Based on the specified languages’ node and 
edge types, a Domain Data Plug-in (see Fig. 4) is generated by Pyro which 
declares types, associations, attributes, and inheritance. The main reason for 
using the DyWA as model layer is its Domain Generator, which generates a 
specific DyWA API providing entities and controllers for the previously given 
types to handle their instances on a simplified layer above the database. This 
closely resembles the APIs generated by EMF in the Eclipse world, so that 
the effort of generating the required Czvco API adapters is greatly reduced, 
which provides functionalities with identical signatures as EMF, so that already 
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Fig. 4. Backend component architecture and interaction. 


existing code can directly be applied (see below). Beyond that, DyWA is prepared 
for dynamic change of the metamodel, which becomes necessary during modeling 
language evolution (see [19]). 

Since CINCO supports to extend the definition of graphical modeling lan- 
guages by user-written Java code for hooks, actions, validation checks, and code 
generators, a holistic reuse mechanism has to be provided in the context of 
Pyro. To meet this goal, the same CINCO interfaces are rebuilt in the generated 
web-based modeling environment, providing the same structure and identical 
signatures. As a result of this, the domain-specific interfaces (see Fig. 4, CINCO 
API) generated by Pyro are compatible to the one CINCO generates for Eclipse 
and EMF to be used identically by these extensions. In contrast to the desktop- 
based CINCO Product, a Pyro graph model instance is not persisted in a file on 
the local system. The Pyro web modeling environment as a distributed system 
utilizes the DyWA database for storage and centralized access as a server. Thus, 
the Cinco API is internally connected to the corresponding generated Dy WA 
API to persist changes in the database, which is hidden from the extensions. 

Multi-user collaborative editing with the generated domain-specific modeling 
languages is one of the main challenges for Pyro. All changes to a centrally held 
instance of a graph model have to be shared with all participants. For the distri- 
bution of the changes performed on a graph model by calling the Cinco API) 
methods, a Command Stack is used, to store each individual modification. Since 
CINCO provides hooks for aspect-oriented extensions, a single action like the 
movement of a node on the canvas can result in multiple successive commands. 
As a result, all modifications on a model or any of their elements at runtime are 
encoded in commands and sequentially stored in the stack. The recorded com- 
mands during the CINCO API usage are used to synchronize between different 


108 P. Zweihoff et al. 


clients looking at the same model as well as the realization of redo and undo 
functionalities. This synchronization mechanism is described in more detail in 
Sect. 4. 

To use the web modeling environment in a desktop application fashion, an 
uninterruptible user interaction is necessary. Thus, Pyro utilizes REST-based 
asynchronous communication for non-blocking data exchange. As a result of 
this, the outermost component of the generated web application is a REST 
Interface. The interface consists of Static Endpoints for project, file, and user 
management, which are independent from the given modeling languages. These 
parts are supplemented by generated Endpoints, which are based on the CINCO 
specification and provide methods to create, read, update, and delete (CRUD) 
a single graph model. In addition to this, the interface contains the central 
endpoint for commands sent from a client’s frontend to the backend. Depending 
on the used Extensions, additional Endpoints are generated to fetch and trigger 
user-written actions or a generator. 


3.2 Frontend 


To mimic the look and feel of a local desktop modeling environment, the web- 
based variant generated by Pyro has to provide versatile user interactions. As a 
result of this, the Frontend of the generated web application (see Fig. 5), which 
realizes the interface for the user, is focused on quick responses and familiar input 
behavior. To achieve this goal, the frontend part of a web modeling environment 
is built upon the Angular Dart [1] framework, which is used to realize single- 
page web applications with built-in cross-platform support and comprises an 
architecture focused on reusable components. In addition to this, it is tailored 
to asynchronous user interaction and client-side routing, so that it can be used 
to build rich internet applications, like, for instance, ones resembling integrated 
development environments (IDEs). 

In contrast to a local desktop application, a web application requires addi- 
tional multi-user focused interfaces. Therefore, the template for the frontend, 
which is initially created, consists of static user interfaces for Registration and 
Login as well as a Project Management area to create, edit, and share projects. 
The specifically generated parts are used by the Editor, which comprises domain- 
specific components. Its user interface is similar to the known Eclipse IDE used 
by regular CINCO Products (see Fig. 2). 

The challenge of preventing delays in the system’s response on a user input 
to enable fluent interaction can be met by avoiding synchronized communication 
with the backend. The editor facilitates this frontend-side computation by two 
layers used to interact with instances of the graph models. The Mirror Layer 
stores a snapshot of the model present in the database, whereas the Interaction 
Layer is a direct representation of a visible graph which can be modified by 
the user. This separation enables a delta between the last valid graph, stored 
in the Mirror Layer and the currently visible graph. Thanks to this, generated 
syntactical validators (e.g., for ensuring lower bounds of given cardinalities) can 
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raise errors and the appropriate rollback operation works immediately on the 
client side without additional communication with the backend. 

Pyro specifically aims at supporting users switching from already existing 
CINCO Products to the web-based modeling environment. Thus, the Editor, 
which is the main part of the frontend, provides multiple components similar 
to the Eclipse IDE. To not confuse users, functions, behavior and arrangement 
are recreated. Besides common user interface parts like a project explorer and a 
menu, specific components for the modeling environment are generated, like the 
Canvas, a Properties View, and the Palette. 

The Canvas is based on the JointJS framework [9], which in general renders 
SVGs and adds versatile user interaction for manipulation of nodes and edges 
via drag&drop functionalities. Using this, it was possible that the web modeling 
environment running in a browser provides very similar handling to the Eclipse- 
based desktop application with its Graphiti editor. The exact replication of the 
node and edge appearance is a central goal of the generated Canvas. Ideally, 
a user cannot distinguish between a Pyro and CINCO visualization of a graph 
model. This requires the same hierarchical shape structure for the web as in 
the Graphiti editor, which can be realized by scalable vector graphics (SVGs). 
The SVG Markup, which defines the shapes and styling information of the nodes 
and edges, is generated based on the concrete syntax specified in the MSL files 
of Cinco. The JointJS framework and SVG Markup files are observed by a 
domain-specific User Event Controller, which realizes the listeners and stream 
handling mechanisms for a single graph model to modify the underlying layers. 

Besides the distinct and visible modifications available directly in the Can- 
vas, attributes of an edge, node or the graph model (as defined in the MGL 
metamodel) can be modified using the Properties View. It has a generic frame 
based on a tree view to recursively walk through associated types of the currently 
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selected element. For every type present in an MGL file, a form for editing the 
primitive attributes (e.g string, Boolean or integer) is generated. The single fields 
are tailored to the specified data type of the attribute, to give as much support as 
possible. Thanks to the two-way data binding of the underlying Angular frame- 
work, every change to an attribute is immediately propagated to the underlying 
layer. 

The Palette is generated based on the given MGL specifications. It lists all 
node types available for modeling. In addition to this, the optional annotations 
of the MGL, e.g. for grouping nodes and dedicated icons for visual support, are 
considered as well. 


4 Collaborative Editing 


One of the main features of modeling environments generated by Pyro is the 
simultaneous editing of graph models by multiple clients at the same time. 
The continuous synchronization between clients avoids classical revision control 
repositories for distributed access and instead enables immediate collaboration. 
To reach the goal of simultaneous synchronization, different aspects have been 
considered to maintain consistency, scalability and achieve a real-time effect. 

In this section, the mechanism used for Pyro web-based modeling environ- 
ments to communicate is presented and explained. The first part discusses the 
different challenges of a distributed system with respect to the domain of graph- 
ical modeling environments, whereas the second part describes the realization of 
the command pattern used to exchange modifications on a graph model. 


4.1 Simultaneous Synchronization Mechanism 


The main communication concept of a generated modeling environment by Pyro 
as a distributed system is the optimistic replication strategy [30]. This concept 
replicates data and allows the single replicas to diverge, which in the context of 
Pyro is realized by the separated graph model replicas held in each client. The 
optimistic replication belongs to the eventually consistent consistency model 
and is furthermore classified as basically available, soft state and eventually con- 
sistent (BASE) [36]. It benefits from high availability, since it only exchanges 
updates on given items. In the context of a web-based modeling environment, 
the updates are based on the modifications a client can do to a node or edge. 
To enable conflict resolution and maintain consistency regarding commutativity 
and idempotency, conflict-free replicated data types (CRDTs) are represented 
by commands. CRDT was originally used for text-based synchronization as a 
simplification of operational transformation [34]. It utilizes an additional data 
structure, based on an identifier of the client, the changed value and the position 
to create a unique identifier for each changed character of the text. Regarding 
the graph models handled by Pyro, CRDTs are realized by commands for each 
type of possible model element modification, which store a unique identifier and 
the changed properties of the relevant element. In addition to this, the previous 
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values of the updated properties are stored as well, to enable rollback, undo, and 
redo functionalities. Thus, Pyro uses operation-based and state-based CRDTs. 
Thanks to the CRDTs, conflicts of simultaneously editing the same model ele- 
ment at the same time can be detected. In the context of graphical DSLs, conflicts 
can arise by violating the given static semantics defined in the metamodel. If 
a conflict is detected, the corresponding command is flagged for rollback and 
returned to its sender. The client then inverts the modification encoded by the 
command and applies it to revert the conflicting change. 


4.2 Distributed Command Pattern 


The distribution of modifications made to a graph model in the Pyro web model- 
ing environment is realized by a command pattern [14]. It belongs to the behav- 
ioral design pattern, which is used to encapsulate all information needed to per- 
form an update on an object. The commands are sent as HTTP POST requests, 
combining the graph model and client identifier. An exemplified collaboration of 
two clients (red and green) modifying the same graph model simultaneously is 
presented in Fig. 6. 

After the initial read from the database, a client only calculates, exchanges 
and receives commands when a modification is done (see Fig.6(1)). For every 
possible change on nodes and edges (e.g., moving a node or bending an edge), a 
dedicated command encoding the modification is created and sent to the server, 
extended with a unique identifier of the sender. Thanks to this assignment, all 
commands can be differentiated (see red commands by client A and green com- 
mands by client B in Fig.6). As an example, the command for the creation of 
a node consists of the node type, the position and an identifier of the container 
where it should be instantiated. Other commands, e.g., the move node com- 
mands, contain information of the previous as well as the new position, so that 
they store the delta of the modification. 

The Serializer (see Fig. 6(2)) is used to parse the received payload and assign 
the commands to the associated Command Applier. Thanks to additional reflec- 
tive type annotations, the received payload can be parsed to recreate the correct 
command type. The assignment depends on the given graph model type the 
command belongs to. 

The Command Applier (see Fig. 6(3)) is the main component of the web 
server, since it receives, validates and executes the commands. Every modifica- 
tion encoded by a command is initially validated against the syntactical con- 
straints defined by the graph model type. In the case of a constraint violation, 
the command is inverted based on the given delta, and returned to undo the 
invalid operation sent from a client. After a successful validation, the modifica- 
tion encoded by the command is applied to the generated domain-specific API, 
which also triggers the annotated hooks and finally modify the node or edge 
instances in the central database. Modifications performed on the API itself 
(e.g., performed by a hook implementation) are again internally encoded as 
commands for further distribution to other clients. The updates resulting from 
the hook execution inside the API are combined with the initial command to be 
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Fig. 6. Concept of the distributed command pattern. (Color figure online) 


interpreted as a single transaction shown by the packages of Fig. 6. To ensure the 
consistency between the sender of a command and the other clients, the initiator 
is also informed about internally arisen modification based on hook execution. 
All commands, collected during the execution of the initial modification, are 
broadcast to other listening clients (see Fig. 6(4)). This mechanism uses bidirec- 
tional ongoing connections, so that clients can request to listen on changes made 
to their currently open graph model. 

The commands received by a client (see Fig. 6(5)) are parsed and inspected, 
to ensure that commands initiated by the client itself are neglected. New changes 
from other clients are applied to all layers and displayed on the canvas. In addi- 
tion to this, the client is notified about received changes. Updates caused as a 
result of self-sent commands (e.g., a modification performed during a hook exe- 
cution), are only partially applied to guarantee that nodes and edges will not be 
modified twice. 

The command pattern applied to the generated modeling environments is 
tailored to enable real-time collaborative editing. The main design decisions are 
focused on scalability and high availability by BASE and CRDT. The operational 
approach realized with this command pattern is more suitable than a textual 
language protocol like the Language Server Protocol (LSP) [3]. The main dif- 
ference between the command pattern and the LSP is the way of distributing 
modifications on the model. In contrast to the presented communication protocol 
of Pyro, the LSP uses changed regions of a text document for propagation. The 
intention of the modification has to be evaluated afterwards, whereas in graph- 
ical DSLs the commands are used for a direct representation of the occurred 
change. 
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5 Conclusion and Perspectives 


We have presented Pyro, a framework for enabling domain-specific modeling via 
the internet. Provided with an adequate metamodel specification, Pyro turns 
a browser into a collaborative, domain-specific, graphical development envi- 
ronment with features reminiscent of desktop IDEs for programming textual 
languages. The required metamodeling is supported in a high-level, simplicity- 
driven fashion: The MGL describes the available node types, edge types, and 
syntactical constraints, whereas the MSL defines the visual appearance of the 
modeling artifacts defined in the MGL. Based on these specifications, the entire 
ready-to-run browser-based domain-specific development environment is gener- 
ated fully automatically, as has been illustrated along the construction of a 
graphical development environment for the Architecture Analysis and Design 
Language (AADL). 

The field of web-based development environments is still quite young, so 
that not many related solutions exist yet. There are the aforementioned collab- 
orative online text editors like Google Docs, Microsoft Office 365 and ShareLa- 
TeX /Overleaf, but in the area of DSLs and modeling, so far we only encountered 
WebGME [5], an (early stage) online adaption of Vanderbilt University’s Generic 
Modeling Environment [18] and Theia [4], a cross-platform web and desktop IDE 
for textual DSLs. In addition, itemis (the German company who significantly 
contributed to the well-known Xtext [6] DSL framework) is currently working 
on a platform called ‘Convecton’, which aims at bringing modeling with and 
execution of domain-specific languages online into the cloud [35]. However, none 
of these solutions provide a Pyro-like, graphical, collaborative modeling support. 

Pyro is still in an early stage of development, and there is a lot of room for 
improvement, like further enhancing and easing the graphical modeling features, 
or improving the performance of collaborative modeling by taking advantage 
of peer-to-peer communication. Pyro is envisioned to enable cross-competence 
collaboration on a single project in a domain/purpose-specific fashion according 
to the Language-Driven Engineering (LDE) paradigm [31]. LDE aims at allowing 
the different stakeholders to formulate their intents in they way they are used to, 
i.e., in their domain language, and restricted in a fashion that the efforts of the 
other involved stakeholders are maintained, or as we say, constitute Archimedean 
points [32] of the considered domain-specific language. Currently, we are starting 
to explore the impact of the Pyro technology on a larger scale for DIME [7], our 
framework for developing Web applications. 
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Abstract. Model synchronization, i.e., the task of restoring consistency 
between two interrelated models after a model change, is a challeng- 
ing task. Triple Graph Grammars (TGGs) specify model consistency 
by means of rules. They can be used to automatically derive specifica- 
tions of edit operations for single models and repair rules that propagate 
model changes to related models. model (re-)synchronization activities 
more effectively, a construction mechanism for short-cut rules has been 
recently developed. They describe consistency-preserving complex edit 
operations across model boundaries. We show that edit and repair rules 
can be derived from short-cut rules. As proof of concept, we implemented 
the construction and application of short-cut edit and repair rules in 
eMoflon. Our evaluation shows that short-cut-rule-based repair processes 
have considerably decreased data loss and improved runtime compared 
to former model synchronization processes in eMoflon. 


Keywords: Model synchronization - Triple Graph Grammars - 
Short-cut rule 


1 Introduction 


Model-driven engineering has become an important technique to cope with the 
increasing complexity of modern software systems. In the field of Concurrent 
Engineering [7], for example, products are no longer realized in series but allow 
parallel tasks. Each of these tasks has its view onto the product and, as a view 
evolves, it may become inconsistent with the other ones. Keeping views synchro- 
nized by checking and preserving their consistency can be a challenging problem 
which is not only subject to ongoing research but also of practical interest for 
industrial applications such as stated above. 

Triple Graph Grammars (TGGs) [24] are a declarative, rule-based bidirec- 
tional transformation approach that aims to synchronize models stemming from 
different views (usually called domains in the TGG literature). Their purpose 
© The Author(s) 2019 
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is to define a consistency relationship between pairs of models in a rule-based 
manner by defining traces between their elements. Given a finite set of rules that 
define how both models co-evolve, a TGG can be automatically operationalized 
into source and forward rules. The source rules of an operationalized TGG can 
be used to build up models of one domain while forward rules translate them to 
models of the other domain, thereby establishing traces between their elements. 
From a synchronization point of view, source rules specify edit operations to 
change one model while forward rules specify repair operations to synchronize 
model changes with one another [16,19,24]. Even though both, the translation 
and the synchronization process, are formally defined and sound, there are in 
fact several practical issues that arise for model synchronization from (poten- 
tially transitive) dependencies between rule applications: To synchronize changed 
models, popular TGG approaches do not always fix inconsistencies locally but 
revert all dependent rule applications and start a retranslation process. However, 
this kind of synchronization often deletes and recreates a lot of model elements 
to reestablish model consistency, potentially losing information that is local to 
just one model and wasting processing time. Existing solutions for this problem 
are rather ad hoc and come without any guarantee to reestablish the consistency 
of modified models [12, 14]. 

As a new solution to this synchronization problem, we derive repair rules from 
short-cut rules [8] that we recently introduced to handle complex consistency- 
preserving model updates more effectively and efficiently. The construction of 
short-cut rules is a kind of sequential rule composition that allows to replace 
a rule application with another one while preserving involved model elements 
(instead of deleting and re-creating them). We used short-cut rules to describe 
model changes exchanging one edit step by another one. Since in this paper we 
want to use short-cut rules for model synchronization as well, they have to be 
operationalized into source and forward rules. 

Our formal contributions (in Sect.4) are two-fold: As short-cut rules may 
be non-monotonic, i.e., may be deleting, we formalize the operationalization of 
non-monotonic TGG rules which decomposes short-cut rules into (semantically 
equivalent sequences of) source (edit) and forward (repair) rules. Moreover, we 
obtain sufficient conditions under which an application of a short-cut rule pre- 
serves the consistency of related pairs of models. This was left to future work 
in [8]. Together, this constitutes the correctness of our approach using opera- 
tionalized short-cut rules for model synchronization. 

Practically, we implement our synchronization approach in eMoflon [21], a 
state-of-the-art bidirectional graph transformation tool, and evaluate it (Sect. 5). 
The results show that the construction of short-cut repair rules enables us to react 
to model changes in a less invasive way by preserving information and increasing 
the performance. We thus contribute to a more comprehensive research trend in 
the bx-community towards Least Change synchronization [5]. Before presenting 
these results in detail, we illustrate our approach using an example in (Sect. 2) 
and recall some preliminaries in (Sect.3). Finally, we discuss related work in 
(Sect.6) and conclude with pointers to future work in (Sect.7). A technical 
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report that includes additional preliminaries, all proofs, and the rule set used 
for our evaluation (including more complex examples) is available online [9]. 


2 Introductory Example 


We motivate the use of short-cut repair processes by synchronizing a Java AST 
(abstract syntax tree) model and a custom documentation model. For model 
synchronization, we consider a Java AST model as source model and its doc- 
umentation model as target model, i.e., changes in a Java AST model have to 
be transferred to its documentation model. There are correspondence links in 
between such that both models become correlated. 


a Root-Rule Root-FWD-Rule 
E [Package ERED] Folder | 
Sub-Rule Sub-FWD-Rule 
O [F FO 
(++) = Da ++ 7 
Leaf-Rule Leaf-FWD-Rule 
Package Folder [Package O 
T cz 
i oa 
Fig.1. Example: TGG rules (Color Fig. 2. Example: TGG forward rules 


figure online) 


TGG rules. Figure1 shows the rule set of our running example consisting of 
three TGG rules: Root-Rule creates a root Package together with a root Folder 
and a correspondence link in between. This rule has an empty precondition 
and only creates elements which are depicted in green and with the annotation 
(++). Sub-Rule creates a Package and Folder hierarchy given that an already 
correlated Package and Folder pair exists. Finally, Leaf-Rule creates a Class and 
a Doc-File under the same precondition as Sub-Rule. 

These rules can be used to generate consistent triple graphs in a synchronized 
way consisting of source, correspondence, and target graph. A more general 
scenario of model synchronization is, however, to restore the consistency of a 
triple graph that has been altered on just one side. For this purpose, each TGG 
rule has to be operationalized to two kinds of rules: source rules enable changes 
of source models which is followed by translating this model to the target domain 
with forward rules. As source rules for single models are just projections of TGG 
rules to one domain, we do not show them explicitly. 


Forward translation rules. Figure 2 depicts the forward rules. Using these rules, 
we can translate the Java AST model depicted on the source side of the triple 
graph in Fig. 3(a) to a documentation model such that the result is the complete 
graph in Fig. 3(a). To obtain this result we apply Root-FWD-Rule at the root 


Efficient Model Synchronization 119 


Package, Sub-FWD-Rule at Packages p and subP, and finally Leaf-F WD-Rule 
at Class c. To guide the translation process, context elements that have already 
been translated are annotated with M in forward rules. A formerly created source 
element gets the marking O — M to indicate that applying the rule will mark 
this element as translated; a formalization of this marking is given in [20]. Note 
that Root-FWD-Rule can always be applied when Sub-F WD-Rule is applicable 
which can lead to untranslated edges. For simplicity, we assume that the correct 
rule is applied which in praxis can be achieved through negative application 
conditions [15]. 


rootP: rootF : rootP: rootF : 
O O 

p: 

Package 


rootP: rootF : 
Package Folder 
p: f: 
Package Folder 
subP : subF : 
Package Folder 
È; cDoc: €: 
Class Doc-File 


(a) (b) (c) 


subPDoc : 
Doc-File 


subPDoc : 
Doc-File 


subP : 
Package 


Fig. 3. Exemplary synchronization scenario 


Model synchronization. Given the triple graph in Fig. 3(a), a user might want 
to change a sub Package such as p to be a root Package, e.g., as could be the 
case when the project is split up into multiple projects. Since p was created and 
translated as a sub Package rather than a root element, this change introduces 
an inconsistency. To resolve this issue, one approach is to revert the transla- 
tion of p into f and re-translate p with an appropriate translation rule such 
as Root-FWD-Rule. Reverting the former translation step may lead to further 
inconsistencies as we remove elements that were needed as context elements by 
other rule applications. The result is a reversion of all translation steps except 
for the first one which translated the original root element. The result is shown 
in Fig. 3(b). Now, we can re-translate the unmarked elements yielding the result 
graph in (c). This example shows that this synchronization approach may delete 
and re-create a lot of similar structures which appears to be inefficient. Sec- 
ond, it may lose information that exists on the target side only, e.g., a use case 
may be assigned to a document which does not have a representation in the 
corresponding Java project. 


Model synchronization with short-cut repair. In [8] we introduced short-cut rules 
as a kind of rule composition mechanism that allows to replace a rule applica- 
tion by another one while preserving elements (instead of deleting and re-creating 
them). In our example, Root-Rule and Sub-Rule overlap in elements as the first 
rule can be completely embedded into the latter one. Figure 4 depicts two possi- 
ble short-cut rules based on Root-Rule and Sub-Rule. While the upper short-cut 
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Root-To-Sub-SC-Rule Root-To-Sub-Repair-Rule 


Package K<X_» Folder Package EXD Folder 
a e mO Ee 
m oa + 


Sub-To-Root-SC-Rule Sub-To-Root-Repair-Rule 


[Facce E 
- 2) 


Fig. 4. Short-cut rules (Color figure Fig. 5. Repair rules 
online) 


rule replaces Root-Rule with Sub-Rule, the lower short-cut rule replaces Sub- 
Rule with Root-Rule. Both short-cut rules preserve the model elements on both 
sides and solely create elements that do not yet exist (++), or delete those 
depicted in red and annotated with (——). They are constructed by overlapping 
both original rules such that each created element that can be mapped to the 
other rule becomes context and as such, is not touched. When a created element 
cannot be mapped because it only appears in the replacing rule, it is created. 
Consequently, an element is deleted if the created element only appears in the 
replaced rule. Finally, context elements occurring in both rules appear also in 
the short-cut rule while overlapped context elements appear only once. Using 
Sub- To-Root-SC-Rule enables the user to transform the triple graph in Fig. 3(a) 
directly to the one in (c). 

Yet, these rules can still not cope with the change of a single model since 
short-cut rules transform both models at once as TGG rules usually do. Hence, 
in order to be able to handle the deleted edge between rootP and p, we have to 
forward operationalize short-cut rules, thereby obtaining short-cut repair rules. 
Figure 5 depicts the resulting short-cut repair rules derived from short-cut rules 
in Fig.4. A non-monotonic TGG-rule is forward operationalized by removing 
deleted elements from the rule’s source graphs as they should not be present after 
a source rule application. Short-cut repair rules allow to propagate source graph 
changes directly to target graphs to restore consistency. In our example, after 
having transformed Package p into a root element, the rule of choice is Sub-To- 
Root-Repair-Rule which transforms Folder f in Fig. 3(a) into a root element and 
deletes the superfluous Doc-File. The result is again the consistent triple graph 
depicted in Fig. 3(c). This repair allows to skip the costly reversion process with 
the intermediate result in Fig. 3(b). Note that applying Sub-To-Root-Repair-Rule 
at arbitrary matches may have undesired consequences: One could, e.g., delete 
the edge between two Folders even if the matched Packages are still connected. 
Our Theorem 8 characterizes matches where such violations of the language of 
the grammar cannot happen. In our implementation, we exploit an incremental 
pattern matcher to identify valid matches. Using suitable negative application 
conditions [6] would be an alternative approach. 
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3 Preliminaries 


To understand our formal contributions, we assume familiarity with the basics of 
double-pushout rewriting in graph transformation and, more generally in adhe- 
sive categories [6,18] as well as the definition of TGGs and in particular, their 
operationalizations [24]. Here, we recall non-basic preliminaries for our work 
which are the construction of short-cut rules, the notion of sequential indepen- 
dence, and a (simple) categorical definition of partial maps. 

In [8], we introduced short-cut rules as a new way of sequential composition 
for monotonic rules. Given an inverse rule of a monotonic rule (i.e., a rule that 
only deletes) and a monotonic rule, a short-cut rule combines their respective 
actions into a single rule. Its construction allows to identify elements that are 
deleted by the first rule as re-created by the second one. These elements are pre- 
served in the resulting short-cut rule. A common kernel, i.e., a common subrule 
of both, serves to identify how the two rules overlap and which elements are 
preserved instead of being deleted and re-created. We recall their construction 
since our construction of repair rules is based on it. Examples are depicted in 
Fig. 4. 


Definition 1 (Short-cut rule). In an adhesive category C, given two mono- 

tonic rules ri : Li > Ri, i = 1,2, and a common kernel rule k : La — Ra for 
1 

them, the Short-cut rule rp! Xp rg := (L > K 4 R) is computed by executing 

the following steps depicted in Figs. 6 and 7: 


The union Ly of Lı and Lz along Ln is computed as pushout (2). 

The LHS L of the short-cut rule rh Xx r2 is computed as pushout (3a). 
The RHS R of the short-cut rule ry ' Kx r2 is computed as pushout (3b). 
The interface K of the short-cut rule ri Xk r2 is computed as pushout (4). 
Morphisms l: K — L andr: K — R are obtained by the universal property 
of K. 


as wee 


Lz JL 
k ZLu 
(4) 
+ 
Ra TA 
Fig. 6. Construction of LHS and RHS Fig. 7. Construction of interface K of 


of short-cut rule ri’ Xk T2 rj‘ Kk T2 
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Sequential independence of two rule applications intuitively means that none 
of these applications enables the other one. This implies that the order of their 
application may be switched. The definition of sequential independence can be 
extended to a sequence of rule applications longer than 2. In Theorem 8, we will 
use this to identify language-preserving applications of short-cut rules. 


l; 


Definition 2 (Sequential independence). Given two rules pi = (Li — 
Ki a Ri) with i = 1,2, two direct transformations G => p,m, Hi and 


Hı >ps,m. H2 via the rules rı and r2 are sequentially independent if there 
exist two morphisms dı : Rı —> Də and də : Lo — Dı as depicted below such 
that nı = food, and m2 = fı o d2. 


ly rı l2 r2 

L a Km yg L2 — Ko —> Ro 

kaa 
mı do mN, [mr dı neg 
27 TA m2 S 
uw 5a 

G < > Di $ > Hı > Də $ Ho 

A fi f2 e2 


Given rules p = (L —> K —> R) and pi = (Li — Ki > Ri) with 1 <i<t,a 
transformation Gt =p,m H is sequentially independent from a sequence of trans- 
formations Go =p; mı G1 >ps,m2 °** Èp m, Grt = 2 if first, Gi =>p,m H and 
Gi-1 Sp, m, Gt are sequentially independent and then, the arising transforma- 
tions Gt_1 => p,erodt Gi, and Gi-2 >p,_1,m:_1 Gt-1 are sequentially independent 
and so forth back to the transformations Go +p,,m, Gi and Gy =p ez0d2 Gs 
(where e; : D; > Gi—ı is given by the transformation and di, : L —> D; exists by 
sequential independence as in the figure above). 


To formalize the application of non-monotonic TGG rules, we need to con- 
sider triple graphs with partial morphisms from correspondence to source (or 
target) graphs. For expressing such triple graphs categorically, we recall a sim- 
ple definition of partial morphisms [23] to be used in Sect. 4.1. An elaborated 
theory of triple graphs with partial morphisms is out of scope of this paper. 


Definition 3 (Partial morphism. Commuting square with partial mor- 
phisms). A partial morphism a from an object A to an object B is a(n equiva- 
lence class of) span(s) A <> A’ & B where ua is a monomorphism (denoted by 
—). A partial morphism is denoted asa: A --+ B; A’ is called the domain of 
a. A diagram with two partial morphisms a and c as depicted as square (1) in 
Fig. 8 is said to be commuting if there exists a (necessarily unique) morphism 
x: A — C such that both arising squares (2) and (3) in Fig. 9 commute. 
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Ast sxe Ac—— A’ —_*_+ B 
f (1) |: / (2) z (3) | 
GB sce Cy aot 
Fig. 8. Square of partial morphisms Fig. 9. Commuting square of partial 
morphisms 


4 Constructing Language-Preserving Repair Rules 


The general idea of this paper is to use short-cut repair rules allowing an opti- 
mized model synchronization process based on TGGs. To this end, we opera- 
tionalize short-cut rules being constructed from the rules of a given TGG. Since 
those rules are not necessarily monotonic, we generalize the well-known opera- 
tionalization of TGG rules to the non-monotonic case and show that the basic 
property is still valid: An application of a source rule followed by an applica- 
tion of the corresponding forward rule is equivalent to applying the original rule 
instead. This is the content of Sect. 4.1. Constructing shortspscut rules in [8], we 
identified the following problem: Applying a short-cut rule derived from rules 
of a given grammar might lead to an instance that is not part of the language 
defined by that grammar. Therefore, in Sect. 4.2, we provide sufficient conditions 
for applications of short-cut rules leading to instances of the grammar-defined 
language only. Combining both results ensures the correctness of our approach, 
i.e., a shortspscut repair rule actually propagates a model change from the source 
to the target model if it is correctly matched. 


4.1 Operationalization of Generalized TGG Rules 


Since the operationalization of TGG rules has been introduced for monotonic 
rules only, we extend the theory to general triple rules and, moreover, allow 
for partial morphisms from correspondence to source and target graph in triple 
graphs. We split a rule on triple graphs into a source rule that only affects the 
source part and a forward rule that affects correspondence and target part. 


Definition 4 (TGG rule). Let the category of triple graphs and graph mor- 
phisms be given. A triple rule p is a span of triple graph morphisms 


R TR 


T (Ug ito.lr) o T ( SoTC: T) o 
L K K 
z ž (Rs Ro Rr)) 


TÈ 
p=((Ls Lc Lr)ś (Ks Ko Kr) 


which, wherever possible, are abbreviated by 


Ugslowlp) (rsorcorr) 
p=(Lsor* °K sors Rscr). 


Rules ps and pr are called source rule and forward rule of p. 


(lg ,idg ,idg) (rg,idg,idg) 
(Ks 0-0)s (Rs0-9)), 


ps = ((Ls—0—0)< 
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Gdpg lotr) Gdrgrorr) 
pr =(RsLor* RsKor Rsor) 


with 0 being the empty graph. In RgLor = (Rs +-- Lo a Lr), the morphism 
rgooK 


from Lo to Rs may be partial and is defined by the span (Lo has Ko 
Rs) with og : Ko @ Ro. Target and backward rules pr and pg are defined 
symmetrically in the other direction. 

Given a TGG, a short-cut repair rule is a forward rule pr of a short-cut rule 
p= ry Kp TQ where 11,72 are (monotonic) rules of the TGG, i.e., a repair rule 
is an operationalized short-cut rule. 


The above definition is motivated by our application scenario, i.e., the case where 
a user edits the source (or target) model independently of the other parts. The 
partial morphism in the forward rule reflects that a model change may introduce 
a situation where the result is no longer a triple graph. A deleted source element 
may have a preimage in the correspondence graph that is not deleted as well. 
In the example short-cut rules in Fig. 4, this problem does not occur since edges 
are deleted only. But in general, this definition of ps has the disadvantage that 
often, pg is not applicable to any triple graph since the result would not be one. 

In practical applications, however, the source rule specifies a user edit action 
that is performed on the source part only, ignoring correspondence and target 
graphs. The fact that the result is not a triple graph any longer is not a technical 
problem. A missing source element that should be referenced by a correspondence 
element gives information about a location that needs some repair. Therefore, 
we define the application of a source rule such that the resulting triple graph 
is allowed to be partial. Furthermore, forward rules may be applied to partial 
triple graphs allowing for dangling correspondence relations. 


Definition 5 (Constructing an operationalized rule application). Let a 
(ls lotr) (rs,ro,Tr) 
Kscr 


triple graph rule p = (Lscr + > Rsor) with source rule 
ps and forward rule pr be given. An operationalized rule application G =ps,ms 
G =>pr,mr H is constructed as follows: 


1. The rule py = Ls = Ks —> Rg is the projection of ps to its source part. 
2. Given a match mg for pg, construct the transformation ty : Gs =p% mk 


Hgs, called source application and inducing the span Gs mie Ds aR Hg. 

3. The transformation t can be extended to the transformation ts : G = 
(Gs “2 Go 4 Gr) >ps,ms G’ = (Hs +-- Go © Gr) via pg at match 
ms. The partial morphism Gc --+ Hs is given as the span Go — Go —> Hs 
that arises as pullback of the co-span Go — Gs —> Ds as depicted in Fig. 10, 
i.e., as morphism gs o pp : Gc --+ Hs with domain Go. 

4. Given co-match ng : Rs — Hg and matches mx : Lx —> Gx with 
X € {C,T} such that both arising squares are commuting, i.e., Mp = 
(ns,mc,mr) is a morphism of partial triple graphs, construct transforma- 
tion tp : G! >pr,mr H = (Hs <= Ho = Hr), called forward applica- 
tion, using transformations Gx =px,mx Hx for X € {C,T} if they exist 
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and if there are morphisms d'h : Do —> Hg and tp : Do —> Dr such that 
HsDcDr — HsGoGr and Rs Kc Kr @ HsDcDr are triple morphisms. 


Gs r 
A gs 
Go (PB) Ds —— Hs 


rw Ab 
Go 
Fig. 10. Retrieval of partial morphism Go --+ Hs 


In the setting of this paper, it is enough to allow for partial morphisms only 
in the input graph and not in the output graph of a forward rule application. 
Intuitively this means that such an application deletes those elements from the 
correspondence graph that could not be mapped to elements in the source graph 
any longer and additionally deletes the preimages in the correspondence graph 
of all deleted elements from the target graph as well (if there are any). The next 
lemma states that the application of a source rule is well-defined, i.e., that the 
mentioned partial morphism actually exists. 


Lemma 6 (Correctness of application of source rules). Let a (non- 
monotonic) triple graph rule 


Ug stowlr) (rgoro rr) 
p=(Lscr* Ksor *Rsor) 


with source rule ps and projection p% to the source part be given. Given a match 
mg for pg to a triple graph G = (Gg “2 Go ®© Gr) such that Gs =p ms Hs, 
the partial morphism Dg --» Hg as described in Definition 5 exists. 


The next theorem states that a sequential application of a source and a 
forward rule indeed coincides with an application of the original rule as long 
as the matches are consistent. This means that the forward rule has to match 
the RHS Rg of the source rule again and the LHS Lc of the correspondence 
rule needs to be matched in such a way that all elements not belonging to the 
domain of the partial morphism from correspondence to source part in the input 
model are deleted. The forward rule application defined in Definition 5 fulfills 
this condition by construction. 


Theorem 7 (Synthesis of rule applications). Let a triple graph rule p with 
source and forward rules ps and pr be given. If there are applications G =ps,ms 
G" with co-match ng and G! +p, mp H with mp = (ns,mc,mr) as constructed 
above, then there is an application G >pm H with m = (ms, mco, mr). 
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4.2 Language-Preserving Short-Cut Rules 


In this section we identify sufficient conditions for an application of a short-cut 
rule that guarantee the result to be an element of the language of the original 
grammar. Since our conditions apply to arbitrary adhesive categories and are 
not specific for TGGs, we present the result in its general form. 


Theorem 8 (Characterization of valid applications). In an adhesive cat- 
egory C, given a sequence of transformations 


G > rm Go =p; mı Gy > p,m °°" P pim Gt Srl Er Mso H 


with rules pı,..., p, and r7} xyr’ being the short-cut rule of monotonic rules 


r: Lo R andr: L'— R' along a common kernel k, there is a match m’ for 
r’ in G and a transformation sequence 


G >r m Gi Spm t-1 = p.m, H, 
provided that 


1. the application of r~! xyr’ with match Mso is sequentially independent of the 
sequence of transformations Go >p,,m, Gi Spam t Èp, m, Ge and 

2. the thereby implied match m! „ for r~! xp r’ in Go, restricted to the RHS R 
ofr, equals the co-match n : R— Go of the transformation G =r,m Go (i.e., 
m0 jr =n where jr embeds R into the LHS of r~! Kx r’ as in Fig. 6). 


In particular, given a grammar GG = (R, S) such that r,r',pı,..., pi E R and 
G € L(GG), then H € L(GG). 


Independence of the short-cut rule application tse : Gt >p-1xpr',mse H from 
the preceding transformation sequence t : G => G; requires the existence of mor- 
phisms in two directions: morphisms d$ from the LHS of the short-cut rule to 
the context objects D; arising in t and morphisms di from the right-hand sides 
R; of the rules p; to the context object of tsc (shifted further and further to the 
beginning of the sequence). In the case of (typed triple) graphs, the existence of 
morphisms d$ ensures that none of the rule applications in t enabled the trans- 
formation tse- The existence of morphisms di ensures that the transformation 
tse does not delete structure needed to perform the transformation sequence t. 


Application to model synchronization. The results in Theorems 7 and 8 are the 
formal basis for an automatic construction of repair rules. Theorem 7 ensures that 
a suitable edit action followed by application of a repair rule at the right match is 
equivalent to the application of a short-cut rule. Thus, whenever an edit action 
on the source model (or symmetrically the target model) corresponds to the 
source-action (target-action) of a short-cut rule, application of the corresponding 
forward (backward) rule synchronizes the model again. Since the language of a 
TGG is defined by its rules, every valid model can be reached from every other 
valid model by inverse application of some of the rules of the grammar followed 
by normal application of some rules. Often, edit actions are rather small steps 
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(or at least consist of those). Thus, it is not unreasonable to expect that many 
typical edit actions can be realized as short-cut rules as these formalize the 
inverse application of a rule followed by application of a normal one. Theorem 8 
characterizes the matches for short-cut rules at which application stays in the 
language of the TGG. For operational short-cut rules, this can either be used 
for detecting invalid edit actions or determining valid matches for synchronizing 
forward rules. 


5 Implementation and Evaluation 
Implementation. Our implementation! of an optimized model synchronizer is 
based on the existing EMF-based general purpose graph and model transforma- 
tion tool eMoflon [21]. It offers support for rule-based unidirectional and bidirec- 
tional graph transformations where the latter is based on TGGs. To support an 
effective model synchronizer, we automatically calculate a small but useful subset 
of all possible short-cut rules. This is done by overlapping as many created ele- 
ments as possible and only varying in the way that context elements are mapped 
onto each other. These selected short-cut rules are operationalized to get repair 
rules that allow us to repair broken links similar to our example in Sect. 2. The 
model synchronization process is based on an incremental graph pattern matcher 
that tracks all matches that dis-/appear due to model changes. Thus, it offers the 
ability to react to model changes without the need to recompute matches from 
scratch. Our implementation uses this technique by processing all those matches 
marked as broken by the pattern matcher after a model change. A broken match 
is the starting point to find a repair match as it is defined by the co-match of 
the performed model change and has to be extended. If the pattern matcher can 
extend a broken match to a repair match, the corresponding short-cut repair rule 
can be applied. Otherwise, we fall back to the old synchronization strategy of 
revoking the current step. This completely automatized synchronization process 
ensures that we are able to restore consistency as long as the edited domain 
model still resides in the language of our TGG. 


Evaluation. Our experimental setup consists of 23 TGG rules (shown in our 
technical report [9]) that specify consistency between Java AST and custom 
documentation models and 37 short-cut rules derived from our TGG rule set. A 
small modified excerpt of this rule set was given in Sect. 2. For this evaluation, 
however, we define consistency not only between Package and Folder hierarchies 
but also between type definitions, e.g., Classes and Interfaces, and Methods 
with their corresponding documentation entries. We extracted five models from 
Java projects hosted on Github using the tool MoDisco [4] and translated them 
into our own documentation structure. Also, we generated five synthetic models 
consisting of n-level Package hierarchies with each non-leafPackage containing 
five sub-Packages and each leaf Package containing five Classes. Given such Java 


1 Both the implementation and evaluation workspace can be accessed via https: i 
github.com/Arikae00/FASE19_eMoflon-evaluation. 
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models, we refactored each model in three different scenarios such as by moving 
a Class from one Package to another or completely relocating a Package. Then 
we used eMoflon to synchronize these changes in order to restore consistency to 
the documentation model, with and without repair rules. 

These synchronization steps are subject to our evaluation and we pose the 
following research questions: (RQ1) For different kinds of changes, how many 
elements can be preserved that would otherwise be deleted and recreated? (RQ2) 
How does our new approach affect the runtime performance? (RQ3) Are there 
specific scenarios in which our approach performs especially good or bad? 

Repair rules were developed to avoid unnecessary deletions of elements by 
reverting too many rule applications in order to restore consistency as shown 
exemplary in Sect. 2. This means that model changes where our approach should 
perform especially good, have to target rule applications close to the beginning 
of a rule sequence as this possibly renders many rule applications invalid. This 
means that altering a root Package by creating a new Package as root would 
imply that many rule applications have to be reverted to synchronize the changes 
correctly (Scenario 1). In contrast, our approach might perform poorly when a 
model change does not inflict a large cascade of invalid rule applications. Hence, 
we move Classes between Packages to measure if the effort of applying repair 
rules does infer a performance loss when both the new and old algorithm do not 
have to repair many broken rule applications (Scenario 2). Finally, we simulate 
a scenario between the first two by relocating leaf Packages (Scenario 3). 


Table 1. Legacy vs. new synchronizer — Time in sec. and number of created elements 


Both Legacy Synchronization Synchro. by Repair Rules 

Trans. Scen. 1 Scen. 2 Scen.3 Scen.1 Scen.2 Scen. 3 
Models Sec Elts Sec Elts Sec Elts Sec Elts Sec Elts Sec Elts Sec Elts 
lang.List 0.3 25 02 20 - = 0.06 5 0.2 0 = = 0.03 0 
tgg.core 6.4 1.6k 39 1.6k 3.8 99 0.64 17 0.8 0 0.11 0 0.05 0 
modisco.java 9.9 3.2k 228 3.3k 18.6 192 3.6 33 25 0 0.2 0 0.09 0 
eclipse.graphiti 20.7 6.5k 704 6.5k 63.9 490 5.65 25 6.1 0 0.21 0 0.09 0 
eclipse.compare 10.74 3.8k 83 3.7k 3.1 76 2.36 47 0.7 0 0.08 0 0.04 0 
synthetic n= 1 0.3 35 0.32 30 0.2 30 0.03 1 0.1 0 0.05 0 0.03 0 
synthetic n =2 0.9 160 1.03 155 0.3 30 0.03 1 0.1 0 0.05 0 0.02 0 
synthetic n =3 2.8 785 6 780 0.4 30 0.04 1 0.1 0 0.07 0 0.02 0 
synthetic n = 4 13.5 3.9k 86.3 3.9k 1.2 30 0.08 1 0.4 0 0.14 0 0.04 0 
synthetic n = 5 91.5 20k 2731 20k 17.4 30 0.14 1 1.5 0 0.37 0 0.09 0 


Table 1 depicts the measured times (Sec) and the number of created elements 
(Elts) in each scenario. Each created element also represents a deleted element, 
e.g., through revoking and reapplying a rule or applying a repair rule that creates 
and deletes elements. In more detail, the table shows measurements for the 
initial translation of the MoDisco model into the documentation structure and 
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synchronization steps for each scenario using the legacy synchronizer without 
repair rules and the new synchronizer with repair rules. 

W.r.t. our research questions stated above, we interpret this table as follows: 
The right columns of the table show clearly that using repair rules preserves all 
those elements in our scenarios that would otherwise be deleted and recreated by 
the legacy algorithm? (RQ1). The runtime shows a significant performance gain 
for Scenario 1 including a worst-case model change (RQ2). Repair rules do not 
introduce an overhead compared to the legacy algorithm as can be seen for the 
synthetic time measurements in Scenario 3 where only one rule application has 
to be repaired or reapplied. (RQ2). Our new approach excels when the cascade 
of invalidated rule applications is long. Even if this is not the case, it does not 
introduce any measurable overhead compared to the legacy algorithm as shown 
in Scenarios 2 and 3 (RQ3). 


Threats to validity. Our evaluation is based on five real world and five synthetic 
models. Of course, there exists a wide range of projects that differ significantly 
from each other due to their size, purpose, and developer styles. Thus, the results 
may probably differ for other projects. Nonetheless, we argue that the four larger 
projects extracted from Github are representative since they are part of estab- 
lished tools from the Eclipse community. In this evaluation, we selected three 
edit operations that are representative w.r.t. their dependency on other edit 
operations. They may not be representative w.r.t. other aspects such as size or 
kind of change, which seems to be of minor importance in this context. Also 
we limited our evaluation to one TGG rule set due to space issues. However, in 
our experience the approach shows similar results for a broader range of TGGs 
which can be accessed through eMoflon. 


6 Related Work 


Reuse in existing work on TGGs. Several approaches to model synchronization 
based on TGGs suffer from the fact that the revocation of a certain rule applica- 
tion triggers the revocation of all dependent rule applications as well [12, 16,19]. 
Especially from a practical point of view such cascades of deletions shall be 
avoided: In [10], Giese and Hildebrandt propose rules that save nodes instead 
of deleting and then re-creating them. Their examples can be realized by our 
construction of repair rules. But they do not present a general construction or 
proof of correctness. This is left as future work in [11] again, where other aspects 
of [10] are formalized and proven to be correct. 

In [3], Blouin et al. added a specially designed repair rule to the rules of their 
case study to avoid information loss. Greenyer et al. [14] also propose to not 
directly delete elements but to mark them for deletion and allow for reuse of these 
marked elements in other rule applications. But this approach comes without 
any formalization or proof of correctness as well. Again, the given example can 
be realized as short-cut repair. These uncontrolled and informal approaches are 


? Scenario 1: We expect the new root element to already be translated. 
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potentially harmful. Re-using elements wrongly may lead to, e.g., containment 
cycles or unconnected data. Hence, providing precise and sufficient conditions 
for correct re-use of data is highly desirable as re-use may improve scalability 
and decrease data-loss. Our short-cut rules formalize when data can be correctly 
reused. In summary, we do not only offer a unifying principle behind different 
practically used improvements of TGGs but also give a precise formalization 
that allows for automatic construction of the rules needed. Thereby, we present 
conditions under which rule applications lead to valid outputs. 


Comparison to other ba approaches. Anjorin et al. [2] compared three state-of- 
the-art bx tools, namely eMoflon [21] (rule-based), mediniQVT [1] (constraint- 
based) and BiGUL [17] (bx programming language) w.r.t. model synchroniza- 
tion. They point out that synchronization with eMoflon is faster than with both 
other tools as the runtime of these tools correlates with the overall model size 
while the runtime of eMoflon correlates with the size of the changes done by 
edit operations. Furthermore, eMoflon was the only tool able to solve all but one 
synchronization scenario. One scenario was not solved because it deleted more 
model elements than absolutely necessary in that case. Using short-cut repair 
rules, we can solve the remaining scenario and moreover, can further increase 
eMoflons model synchronization performance. 


Change-preserving model repair. Change-preserving model repair as presented 
in [22,25] is closely related to our approach. Assuming a set of consistency- 
preserving rules and a set of edit rules to be given, each edit rule is accompanied 
by one or more repair rules completing the edit step, if possible. Such a com- 
plement rule is considered as repair rule of an edit rule w.r.t. an overarching 
consistency-preserving rule. Operationalized TGG rules fit into that approach 
but provide more structure: As graphs and rules are structured in triples, a source 
rule is also an edit rule being complemented by a forward rule. In contrast to 
that approach, source and forward rules can be automatically deduced from a 
given TGG rule. By our use of short-cut rules we introduce a pre-processing step 
to first enlarge the sets of consistency-preserving rules and edit rules. 


Generalization of correspondence relation. Golas et al. provide a formalization of 
TGGs in [13] which allows to generalize correspondence relations between source 
and target graphs as well. They use special typings for the source, target, and 
correspondence parts of a TGG and for edges between a correspondence part and 
source and target part instead of using graph morphisms. That approach also 
allows for partial correspondence relations. But it makes the deletion of elements 
more complex as it becomes important how many incident edges a node has (at 
least in the double-pushout approach). We therefore opted for introducing triple 
graphs with partial morphisms. They allow us to just delete a node without 
caring if it is needed within an existing correspondence relation. 


Efficient Model Synchronization 131 


7 Conclusion 


Model synchronization, i.e., the task of restoring consistency between two mod- 
els after a model change, poses challenges to modern bx approaches and tools: 
We expect them to synchronize changes without losing data in the process, thus, 
preserving information and furthermore, we expect them to show a reasonable 
performance. While Triple Graph Grammars (TGGs) provide the means to per- 
form model synchronization tasks in general, both requirements cannot always 
be fulfilled since basic TGG rules do not define the adequate means to support 
intermediate model editing. Therefore, we propose additional edit operations 
being short-cut rules, a special form of generalized TGG rules that allow to take 
back one edit action and to perform an alternative one. In our evaluation, we 
show that operationalized short-cut rules allow for a model synchronization with 
considerably decreased data loss and improved runtime. 

To better cope with practical application scenarios, we like to extend our 
approach by formally incorporating type inheritance, application conditions and 
attributes in the model synchronization process. Since all of these have been 
formalized in the setting of (M-)adhesive categories and our present work uses 
that framework as well, these extensions are prepared but up to future work. 
Propagating changes from one domain to another is basically done here by oper- 
ationalizing short-cut rules. A more challenging task is what we call model inte- 
gration where related pairs of models are edited concurrently and have to be 
synchronized. These model edits may be in conflict across model boundaries. It 
is up to future work to allow short-cut rules in model integration. Our hope is 
to decrease data loss and to improve runtime of model integration tasks as well. 
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Abstract. When model transformations are used to implement consis- 
tency relations between very large models (VLMs), incrementality plays 
a cornerstone role in the realization of practical consistency maintainers. 
State-of-the-art model transformation engines with support for incre- 
mentality normally rely on a publish-subscribe model for linking model 
updates — deltas — to the application of model transformation rules, 
in so called dependencies, at run time. These deltas can then be propa- 
gated along an already executed model transformation. A small number 
of such engines use domain-specific languages (DSLs) for representing 
model deltas offline in order to enable their use in asynchronous, event- 
based execution environments. 

The principal contribution of this work is the design of a forward 
delta propagation mechanism for incremental execution of model trans- 
formations, which decouples dependency tracking from delta propagation 
using two innovations. First, the publish-subscribe model is replaced with 
dependency injection, physically decoupling domain models from consis- 
tency maintainers. Second, a standardized representation of model deltas 
is reused, facilitating interoperability with EMF-compliant tools, both for 
defining deltas and for processing them asynchronously. This procedure 
has been implemented in a model transformation engine, whose perfor- 
mance has been evaluated empirically using the VIATRA CPS bench- 
mark. In the experiments performed, the new transformation engine 
shows gains in the form of several orders of magnitude in the initial 
phase of the incremental execution of the benchmark model transforma- 
tion and delta propagation is performed in real time, independently of 
the size of the models involved, whereas the up-to-now best-performant 
approach is dependent. 


Keywords: Mappings between languages - Traceability - 
Incremental model transformation - Performance benchmark 


1 Introduction 


Significant issues in the application of Model-Driven Engineering (MDE) in 
large-scale industrial problems stem from interoperability and scalability of 
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current MDE tools [1, 16,17]. Model transformation, widely accepted as the heart 
and soul of MDE [23], deals with model manipulation either by translating mod- 
els or by synchronizing them. Current tool support for model transformation is a 
key root cause for many of the bottlenecks hampering scalability in MDE [2,8]. 
This is particularly crucial when transformations are used to implement consis- 
tency maintainers between very large models (VLMs), consisting of milions of 
elements. In this context, incrementality ensures that only those parts of the 
model that are inconsistent or that have been modified — a model delta — are 
transformed or, more precisely, propagated along an already executed transfor- 
mation [11,12]. 

Current state-of-the-art approaches that support incremental execution of 
model transformations share common features: the delta propagation mecha- 
nism is usually decoupled from the delta detection mechanism in order to facil- 
itate maintainability of the consistency maintainer; and deltas are represented 
either in memory for synchronous notification or offline, with dedicated domain- 
specific languages, for asynchronous notification. The most mature tools rely 
on a publish/subscribe mechanism, where model deltas are notified at run time 
whenever a model is updated. This notification mechanism is synchronous and 
loosely couples model updates with the delta propagation mechanism, facilitat- 
ing maintainability of the underlying transformation engine after fixing the type 
of notification. However, it usually requires an observer for each object that can 
be modified, with a consequent impact on performance, and the model transfor- 
mation must be live, in memory, in order to listen for changes. These problems 
can be avoided by using offline deltas. The publish/subscribe mechanism can be 
extended to enable asynchronous delta notification but this is normally achieved 
by using dedicated domain-specific languages to represent deltas offline, which do 
not involve standardized formats, hindering the interoperability of those trans- 
formation engines in existing modeling tool ecosystems. 

In this paper, the design of a forward delta propagation procedure is pre- 
sented for executing model transformations in incremental mode that can handle 
documented change scenarios [4], i.e. documents representing a change to a given 
source model. Such documents are defined with the EMF change model [24], 
both conceptually and implementation-wise, guaranteeing interoperability with 
EMF-compliant tools. This design decision replaces a publish/subscribe notifi- 
cation with dependency injection: each notification is directly performed by the 
implementation of the domain model at run time by injecting the dependency 
corresponding to the model update that has been performed. Aspect-oriented 
programming is used to weave code into an already existing implementation of a 
domain model totally decoupling domain models from the consistency maintainer 
at design time. The proposed forward delta propagation procedure has been 
implemented in YAMTL [6], a model transformation engine for VLMs, enabling 
the execution of model transformations both in batch mode and in incremental 
mode without additional user specification overhead. This new extension dra- 
matically improves the performance of the batch execution mode when dealing 
with sparse model deltas, which can be propagated in real time (i.e. in us.). 
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This work is structured as follows: Sect.2 provides a self-contained descrip- 
tion of the class of model transformations supported using a class diagram to 
relational schema model transformation; Sect. 3 presents the forward propaga- 
tion procedure implemented in the model transformation engine together with 
the main innovations; Sect.4 discusses the performance of the transformation 
engine with an adaptation of the VIATRA CPS benchmark; Sect. 5 discusses 
related work from reactive and bidirectional model transformation. 


2 Model Transformation: A Running Example 


The type of model transformations that are considered in this work are classified 
as unidirectional and out-place. For example, when considering the well-known 
example that maps class diagrams to relational schemas, a class diagram is used 
by queries to extract information and a relational schema is built from scratch. If 
we consider a graph transformation perspective, both models are considered to 
form part of the same graph in order to enable transformation by rewriting. In 
that case, we are only considering transformations where the two models are two 
clearly disjoint subgraphs and where rewriting is performed deterministically. 

In this work, model transformations are represented using an implementation- 
agnostic graphical syntax, quite close to that used in the graph transformation 
literature. In this representation, metamodels are given as class diagrams, the 
abstract syntax of models is given as object diagrams and model transformations 
are represented as a collection of rules, where each rule is defined as a pair of 
model patterns, called left-hand side (LHS) and right-hand side (RHS). The 
notion of metamodel, model and model pattern correspond to those of type 
graph, attributed graph with containments and node inheritance, and graph 
pattern in the graph transformation literature [5,10]. For example, the rules 
A->C and R->FK of Fig.1 map attributes to columns. The $ before a variable 
denotes string interpolation. 

Graph patterns in rules can be augmented with universally quantified vari- 
ables (represented by an overlaid box). Moreover, rules are augmented with a 
when clause to express conditions that must be satisfied by the variables in LHS, 
and with a where clause to indicate how variables from LHS and from RHS 
are related via the application of other rules, expressed as two graph patterns. 
Formulas in a when clause may be expressed in conjunctive form, as all filter 
conditions must be satisfied in order for the rule to be applied, whereas formu- 
las in a where clause may be expressed in disjunctive form (assuming mutually 
exclusive conditions), as all the side effects expressed in a where clause must be 
evaluated. The variables of RHS of the main rule must appear either in the LHS 
of the main rule or in the RHS of a where transformation step. The rule C->T of 
Fig. 1 illustrates how to map a class to a table with a primary key column PK_COL 
and for each attribute A whose type is a DataType, the corresponding column is 
obtained by applying a rule, with the rule A->c, and for each attribute OTHER 
whose type is the class C, matched in LHS of the main rule, a new foreign key 
column is added to the table T, with the rule R->FK. 
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Fig. 1. Metamodels, example and transformation rules. 


From an operational point of view, transformation rules are applied unidi- 
rectionally from LHS to RHS performing an out-place transformation following 
two steps. First, during the matching phase, matches for the rules in the model 
transformation are found as long as they are not shared by different rules and 
these are included in a set matchPool. A match is formally defined as a graph 
morphism from LHS to the source graph, which satisfies the when conditions, 
but it is represented as a map from variables to object identifiers for the sake of 
presentation in this paper. 

Second, during the execution phase, each match is processed by triggering the 
application of a transformation rule, which is represented as a transformation 


step, denoted by r : inte ç — outt>¢, which consists of a labelled pair of 
two matches, the match for the input pattern of the rule, which enables its 
application, and the match for the output pattern of the rule, with the objects 
that result from applying the rule. When a rule is applied, the source model is 
only used for query purposes but the target model is constructed by adding the 
pattern of the RHS instantiated with values from the variables both in the LHS 
and in the RHS of where transformation steps. In addition, where transformation 
steps may further expand the structure of the target model. This execution 
model resembles the application of forward rules used in triple graph grammars 
(TGGs) [22], where the source graph is annotated as rules are applied and only 
the target graph is constructed together with a link in a correspondence graph, 
where each link denotes a transformation step. 
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3 Delta-Driven Model Transformations 


This section presents the mechanism to propagate documented deltas 6; from a 
source model M, to a target model M; in an incremental way, when the (unidirec- 
tional) synchronization correspondence between these two models is represented 
with a model transformation t as described in the previous section. This has 
been implemented in the YAMTL transformation engine [6], which has been 
extended with two modes of execution: initialization, the transformation is exe- 
cuted in batch mode but, additionally, tracks those parts of the source model 
involved in transformation steps as dependencies; propagation, the transforma- 
tion is executed incrementally for a given source delta. 

In order for a model transformation to be executed in propagation mode, it 
first needs to be executed in initialization mode in order both to create trans- 
formation steps and to inject the dependencies that facilitate the analysis of the 
impact of changes in the already executed model transformation. Therefore, the 
transformation t is applied to M, using the original batch semantics [6] while 
injecting dependencies in the transformation engine. Once the initialization is 
done, any number of source forward deltas 6, can be propagated. 

Given a source documented delta 6, between a source model M,, already 
synchronized with a target model M; via a model transformation t : Ms Š M 
(where Ž, denotes a sequence of transformation steps), and an updated source 
model M!, the transformation engine propagates the model update 6, along t. 
The effect of this forward propagation is the application of an update ô, on the 
target model M;. 

In the following subsections, we explain the different phases of the new exe- 
cution modes, initialization and propagation, in more detail. As the initialization 
mode faithfully corresponds to the batch execution of a model transformation, 
the discussion of this mode focuses on the type of dependencies that are injected 
in the transformation engine in Sect.3.1. The discussion on the propagation 
mode focuses on how deltas are represented in Sect. 3.2. Then, the two main 
phases of the propagation execution mode, namely impact analysis and delta 
propagation, are explained in Sects. 3.3 and 3.4, respectively. 


3.1 Dependency Injection 


When running a model transformation in initialization mode, the engine mon- 
itors the source model and whenever an object ¢ is matched or a feature call, 
represented as a pair (ç, f) of an EMF object ç and a feature name f, is per- 
formed, a dependency is injected into the dependency registry. A dependency 
thereby links either an object ç or a feature call (ç, f) to transformation steps 


r:int+>s— out e> ç in which it is used. Such dependencies are detected both 
during the matching phase and during the execution phase. 

In the matching phase, while finding a match for a rule, the engine keeps track 
of all of the feature calls used in both element and rule when conditions. When 
a match is found to be valid, the collection of dependencies is injected into the 
dependency registry for the transformation step that uses that match. Otherwise, 
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Table 1. Analysis of dependencies for the initial MT t : Ms =, M; of Fig. 2. 


Rule | Source Match Target Match Dependencies from Ms 

C->T cre 1 trl, (1,name), (l,att), 
pk_col +> 4 (5, type), (5, mu1tiValued) 

C->T cred tr 6,pk_colt> 7 (4, name), (4, attr) 

A->C att > 2 col e= 2 (2, name) 

A->C att > 3 cole 3 (3, name) 

R->FK ref > 5 fk_colt> 5 (5,name), (5,type), 
fk_colr 5 (1, name), (4,name) 


when the match is not valid, the collected dependencies are discarded. Addition- 
ally, when inserting a match in the matchPool, the transformation engine also 
records reverse matches as injected dependencies between matched objects ¢ and 
the transformation step in which they are matched. 

Dependencies may also be found when executing a transformation step, e.g., 
while executing initialization expressions associated with attributes in model 
patterns in RHS and in where clauses. In such cases, the transformation engine 
injects a dependency for the transformation step every time a feature call in 
the source model is detected. As a result, note that several transformation steps 
may depend on the same object ¢, when rules have more than one single input 
element, or on the same feature call (ç, f). 

Table 1 shows the dependencies that are found when executing the transfor- 
mation of Fig. 1 in initialization mode from model Ms.. Each row in the table 
represents a transformation step, where: the source match indicates where the 
rule has been applied, the target match indicates what objects were created, and 
dependencies refers to the set of feature calls associated with a transformation 
step. Reverse matches are extracted from source matches, by reading them in 
the opposite direction. 

Dependency injection is configured with an aspect whose pointcut matches 
feature calls under a user-defined namespace. Hence, the model transformation 
engine is entirely decoupled from the domain model at design time. They become 
tightly coupled at compilation time and, hence, at run time. 


3.2 Representable Deltas 


The EMF change model [24] is used to represent deltas to an instance of any 
other EMF model. It is built-in in EMF and, therefore, available for any EMF- 
compliant tool. In this section, we describe how a documented delta is repre- 
sented with the EMF change model and how it can be automatically defined 
given any potentially live atomic update. 

A delta consists of a ChangeDescription which contains a map of 
objectChanges, which refer to those objects that are updated and, for each 
such object, it contains a list of FeatureChanges. A FeatureChange (FC) refers 
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to the structural feature that needs to be updated and provides the new 
value. For single-valued attributes, a FeatureChange contains the new dataValue 
if the feature is an attribute. For references and multi-valued attributes, 
a FeatureChange includes a containment reference listChanges pointing to 
ListChange. ListChanges are used to represent addition to, removal from, or 
movement within the given feature values. In particular, movement only cap- 
tures when an object changes to a different index within the collection. However, 
it does not capture structural changes, e.g. change of container, which are rep- 
resented as a removal from and an addition to the corresponding containment 
references. When a FeatureChange refers to a containment reference, objects to 
be added are pointed by objectsToAttach and objects to be removed are pointed 
by objectsToDetach. 

FeatureChanges capture when a feature value is updated for an object but 
EMF also permits adding and removing root objects to a resource, representing 
the model in memory, which need not be contained by any other object. Such 
changes are considered to be performed on the resource itself and are represented 
with ResourceChanges, one for each changed resource. A ResourceChange (RC) 
contains the ListChanges for the root objects of the corresponding resource, 
similarly to multi-valued features. For a more detailed explanation of the EMF 
change model, we refer the reader to [24]. 

Table 2 shows a classification of atomic model updates that are representable 
with the EMF change model as explained above. Note that moving and object 
structurally, case 12 — move (inter.), — is represented in a composite delta by 
two opposite actions, removing the object either from the root contents of the 
resource — if it is a root object (case 2) — or from a containment reference — if it 
is a contained object (case 10) — and adding it either to the root contents of the 
resource — if it is to become a root object (case 1) — or to another containment 
reference in another container object (case 9). This case is not captured by the 
EMF change model explicitly but the transformation engine is able to infer it, 
as explained in the following section. 


Table 2. Summary of model update types, with their representation in EMF. 


Cases | Granularity | Level Feature Delta action | Delta representation | DO | DFC 
1,2 | atomic root add/remove | RC::listChanges | v 
3 atomic root move (intra.)| RC: :listChanges 
4,5 atomic any |single-valued att | add/remove FC v 
6,7 | atomic any | multi-valued att | add/remove | FC::listChanges | Vv | v 
8 atomic any | multi-valued att | move (intra.)| FC::listChanges v 
9,10 | atomic any ref add/remove | FC::listChanges v 
11 atomic any ref move (intra.)| FC::listChanges v 
12 composite any | containment ref | move (inter.) opposite remove V 
and add actions 
in cases {2, 10}/{1, 9} 
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A delta, which may represent atomic and composite changes, is defined as 
an instance of the EMF change model and can be serialized. EMF also provides 
facilities for applying them and reversing them. Furthermore, EMF provides a 
change recorder, which enables recording live updates as a ChangeDescription for 
either a root object, a collection of root objects, a resource or a resource set. 


The resulting ChangeDescription is the representation of a history scenario [4], 


from the updated model to the original one, which is optimized. That is, atomic 
changes for the same feature of the same object may be discarded or merged, 
as long as the optimization process preserves reversibility. Hence, reversing the 
recorded delta may yield less changes than were originally made. Reversed deltas 
represent documented scenarios and can be propagated along a model transfor- 


mation, as discussed in subsequent sections. 


Propagation of delta a (case 4) 


1:Table 6: 


Table 


name = "String" 


name = "Item" 


Pe wf SST 
attr attr attr 


name = "Order" | | name = "Product" 


col Z f \ col 


O:DataType 1:Class 4:Class name = "Item" Say ae 
name ="String’ | [name ="item" | oo name = "Invoice" ol a f T col Yoot 
type? Pt z 4:Column 7:Column 
ype atty attr attr Zicolumn 
2:Attribute 3:Attribute 5:Attribute name = "product" name = "pk_Item" | | name = "pk Invoice" 
name = "product" | | name = "date" name = "items" col” _ col 
multiValued=false | | multiValued=false | | multiValued=true 3:Column 5:Column 
name = "date" | | name ="fk_Item-items-->Invoice" 
Propagation of delta b (case 1) 1:Table 6:Table 8:Table 
0:DataType 1:Class 4:Class 6:Class name = "Item" name = "Order" name = "Product" 


Foot Foot 
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2:Attribute 


3:Attribute 


5:Attribute 


name = "product" 
multiValued=false 


name = "date" 
multiValued=false 


name = "items" 
multiValued=true 


2:Column 


4:Column 


7:Column 


9:Column 


name = "product" 


name = "pk_Item" 


name = "pk_Order" 


name = "pk_Product" 


col col 


3:Column 


5:Column 


name = "date" 


name = “fk_Item--items-->Order" 


Propagation of delta c (case 9) 
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Propagation of delta d (case 10) 
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Propagation of delta e (case 11) 


1:Table 6:Table 
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Fig. 2. Source/target metamodels, initial synchronized models and forward delta prop- 


agation (a-e). 
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The EMF change recorder enables the possibility of deferring the observation 
of updates to the point in which they occur, saving memory resources, and 
interoperability. Furthermore, recorded (history) deltas can be regarded as a 
rollback mechanism for implementing transactional model updates, which may 
be performed live. 

Figure 2 shows examples of documented deltas, defined over the source model 
M, of the running example. Such deltas are representable as EMF model changes, 
i.e. operationally, but are graphically depicted using the abstract syntax of M,, 
using their state-based representation for the sake of presentation. Additions and 
updates, including moves, are highlighted in grey colour. Objects that are added, 
and thus created, have a new identifier. Objects that are updated and/or moved 
preserve their identifier. Removals are highlighted by using dashed lines for the 
contour lines of the corresponding shapes. The given deltas are instantiations of 
case 4 (delta a), changing the name of the class Order to Invoice; case 1 (delta 
b), adding a root class Product; case 9 (delta c), adding a single-valued attribute 
amount to class Item; case 10 (delta d), removing the attribute date from class 
Item; and case 11 (delta e), structurally moving the attribute date from class 
Item to class Order. 

In the following subsections, the different phases of the procedure for forward 
propagation of source deltas is discussed and the aforementioned examples will 
be used for illustrating them. 


3.3 Impact Analysis 


In this subsection, we discuss how source documented deltas are analyzed in 
order to determine which transformation steps are affected by source changes. 
This analysis is comprised of three main steps: identification of atomic model 
updates from a documented delta, initialization of locations for newly enabled 
rules, and marking of transformation steps impacted by changes. 


Identification of atomic model updates. In the first step, the transformation 
engine infers which objects and which feature calls have been impacted by 
changes. For objects, it also infers whether an object has been added or removed, 
ignoring if the object is moved, either within the same collection or structurally. 

For affected objects, such information is recorded in the set DO of dirty 
objects of the form (çs, ctype), where ¢ is the affected object and ctype is the type 
of change from the set { ADD, DEL}. To obtain a dirty object from the delta, 
FeatureChanges and ResourceChanges are traversed considering two cases: when 
an object ¢ is added either to a containment feature (for a FeatureChange) or to 
the root contents of the resource (for a ResourceChange) and such object is not 
removed elsewhere in the delta, either from a containment reference or from the 
root contents of the resource; and, similarly, when an object is deleted and it 
is not added elsewhere in the delta. DO is augmented with (çs, ADD) in the first 
case and with (ç, DEL) in the second case. 

For affected feature calls, such information is recorded in the set DFC of 
dirty feature calls of the form (çs, f), where ¢ is an object and f is a feature 
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Table 3. Impact analysis of source deltas a-e. 


Case DO DFC Rule Source Match Target Match matchPoola | dirty? 

aj 4 — (4, name) |C->T c4 t — 6,pk_col++ 7 v v 
b 1 (6 ADD] — [c-t c6 v 

c 9 (1, attr) |C->T creel tr 1,pk_colr 4 Vv v 
(6, ADD) A->C att 6 V 

d| 10 (1, attr) |C->T ce 1 tro 1,pk-col > 4 v v 

(3, DEL) A->C) attre3 cole 3 v 

e| 11 — (1, attr), |C->T ce 1 t > 1,pk-col > 4 v v 

(4, attr) |C->T ch4 t — 6,pk_colt+ 7 v v 

name. For each FeatureChange of an ObjectChange, the dirty feature call (ç, f) 


with the object ¢ referred by the ObjectChange and the feature name f referred 
to by the FeatureChange is added to DFC. 

Table 2 shows how atomic model update types are represented using the EMF 
change model (column delta representation), internally, using the sets DO and 
DFC. Table 3 shows the sets DO of dirty objects and DFC of dirty feature calls 
for the source deltas of Fig.2. Note that the sets DO and DFC decouple the 
transformation engine from the EMF change model and provide another entry 
point for defining deltas programmatically, which can be used for capturing 
atomic live changes received via EMF adapters. 


Initialization of delta locations. For each dirty object (ç, ADD), the object ç 
is added to the extent associated with type(o) in the location map used for 
delta propagation. This potentially enables new matches when rules are matched 
during the delta propagation phase. 


Marking of impacted transformation steps. In this step, transformation steps 
that are affected by the atomic changes in the source delta are marked as dirty. 
For each dirty object (s, ADD) € DO, the extent of type type(s) is augmented 
with ¢. This will potentially enable new matches for some rule during the change 
propagation phase. For each dirty object (ç, DEL) € DO, we obtain the list of 
transformation steps that are affected from the map of reverse matches. Such 
transformation steps will then remain transient and the objects in their target 
match will not be linked to other objects in the target models. In particular, 
note that when processing root objects or a containment reference, an object 
that is removed in the delta is not present in the updated source model and, 
therefore, it does not trigger the transformation step that had been executed in 
the initial transformation. 

For each dirty feature call (çs, f) € DFC we obtain the list of transformation 
steps that are affected from the registry of dependencies. For each such transfor- 
mation step, the satisfaction of its source match is checked. If such source match 
is still valid, then it is inserted into matchPool,, the pool of matches that are 
used to schedule rule applications during the change propagation phase. 
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For each atomic change in Fig. 2, Table 3 shows the marking of transforma- 
tion steps that are (re-)scheduled according to the dependencies of Table 1. In 
particular, if a transformation step is re-scheduled, its current source and target 
matches are included, it is marked as dirty and included in matchPool,. If a 
transformation step is not to be re-executed, it is simply marked as dirty. New 
transformation steps, with fresh matches due to new objects, are scheduled in 
matchPool,. This last step is actually achieved by augmenting the correspond- 
ing type extent with the new objects and the matches are scheduled during the 
change propagation phase, explained in the next subsection. 


3.4 Change Propagation 


After the impact analysis phase, delta propagation proceeds by executing a 
model transformation using the matching and execution phases, as outlined in 
Sect. 2. Figure2 illustrates the propagation of source deltas according to the 
model transformation of Fig. 1. We highlight how incrementality has been con- 
sidered in these two phases below. 


Matching Phase. During the matching phase (in batch/initialization execution 
mode), matches for a given rule are found by traversing objects from the extent 
of the types associated with the elements of the source pattern of the rule, 
with the constraints specified in the form of graphical patterns and when condi- 
tions. In propagation mode, the transformation engine employs the same pattern 
matching algorithm but it fetches objects from the location map used for delta 
propagation, initialized during the change impact analysis phase. Therefore, new 
matches may be found for objects that have been created by the source delta. 
Those matches are inserted both into matchPool and matchPool,, scheduling 
new transformation steps. Table 3 shows that two new transformation steps are 
scheduled, one for rule C->T in delta b, and one for rule A->c in delta c. 


Execution Phase. During the execution phase, transformation steps determined 
by the matches in matchPool, are executed. Such matches originate from the 
impact analysis phase, corresponding to transformation steps that are dirty and 
need to be re-executed, and from the matching phase above, corresponding to 
new transformation steps. 

The re-execution of a transformation step is performed as in the 
batch/initialization mode but for the creation of transformation steps. Whereas 
a newly scheduled transformation step needs to get its output objects initialized 
(instantiated for output elements), a dirty transformation step reuses the objects 
of the target match and unsets their features. This avoids loss of contextual 
information, which is not affected by changes, when re-executing a transforma- 
tion step. In particular, those references to output objects that emerge from the 
external context are preserved. On the other hand, references from those output 
objects are re-calculated by re-executing the transformation step. It is worth 
noting that the transformation engine uses where clauses to define references to 
objects that are created by other rules, which in turn uses a cache mechanism 
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to avoid re-executing the transformation step that produced it. Therefore, when 
a dirty transformation rule is re-executed, the initialization of output element 
bindings are performed again. However, those bindings that are initialized in a 
where clause are also initialized incrementally. That is, only those objects that 
belong to a match of a new scheduled transformation step will be transformed 
from scratch. References to already initialized objects will be simply fetched. 
Hence, the granularity of the target delta is as fine grained (at binding level) as 
the source delta for the underlying graph structure of the model. 


4 Performance Analysis 


For the empirical analysis of the incremental execution of model transformations 
in YAMTL using the propagation procedure presented above, we have used the 
VIATRA CPS benchmark [27]. The transformation YAMTL-incr implemented 
for our model transformation engine passes the sanity checks of the benchmark. 
The software artifacts used in this section and the results obtained are publicly 
available in a GitHub repository [7] and YAMTL is available at https://yamtl. 
github.io/. 

This evaluation is an extension of the one performed for the batch com- 
ponent of the VIATRA CPS benchmark in [6]. From the original VIATRA 
CPS benchmark, two incremental variants of the transformation implemented 
with EMF-IncQuery have been selected: ExplicitTraceability (EXPL) [25] and 
QueryResult Traceability (QRT) [26], out of which the first one is the best per- 
forming solution up to now. These transformations have been extracted as inde- 
pendent Java projects. Classes implementing them have been kept intact in the 
new projects, including their namespaces, so that errors are not introduced due 
to lack of expertise. Although these two transformations produce results that 
are different from the other transformations, the main differences are due to 
reordering of multi-valued references and we have considered them valid for this 
evaluation. On the other hand, a benchmark measurement harness considering 
the best practices recommended by the VIATRA team [13] was developed in 
order both to fine-tune measurements and to crosscheck results. This harness 
removes dependencies to other components of the VIATRA CPS benchmark so 
that experiments can be run locally. 

In the present work, we aimed at answering the following research questions: 
(RQ1) Does YAMTL-incr show any performance penalty w.r.t. its execution in 
batch mode (YAMTL-batch)? (RQ2) Does YAMTL-incr show any improvement 
in performance w.r.t EXPL or QRT during initialization phase? (RQ3) And 
during propagation phase? 

From the scenarios provided in the original benchmark, the scenarios client- 
server and statistic based [29] were considered. The CPS model generator [28] 
was used to obtain the input models to be used for the analysis so that their size 
depends on a logarithmic factor. The biggest models considered, in the client 
server scenario, consist of millions of nodes (10.16M) and edges (27.53M) and 
are, hence, VLMs. 
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For each tool and scenario, the experiments are run in isolation, i.e. in a 
separate Java process. For each of the input models, an initial experiment is 
performed to warm up the JVM and, then, twelve more experiments to measure 
performance. Each experiment consists of four phases: model load and engine 
initialization, initial transformation, delta propagation and model storage. In 
between each execution phase, the harness sends hints to the JVM to run garbage 
collection and waits for one second before proceeding on to the next phase. The 
first phase includes the instantiation of a fresh engine instance, avoiding interfer- 
ence between experiments as caches are not reused. The delta propagation phase 
includes the application of the delta to the source model and its propagation. 
Only initial transformation and delta propagation times have been considered in 
the quantitative analysis. For the results the median obtained for each of these 
two phases out of ten experiments is used, after removing the minimum and the 
maximum results. 

In both solutions EXPL [25] and QRT [26], the delta is applied to the source 
model by directly modifying the resource containing the model. In the solution 
with YAMTL such delta was recorded and persisted using the EMF change 
model as described in Sect.3.2. To analyze whether this feature could become 
a threat to validity, a separate experiment was run by excluding the query part 
of the model update (searching for the objects to be updated) in the solution 
EXPL but this change did not affect performance results perceptibly and the 
original solutions provided by the authors of the VIATRA CPS benchmark were 
considered. Therefore, the actions performed during the propagation phase are 
equivalent in all of the evaluated solutions. 


—ViatraExp! ViatraQRT ——YAMTLincr —#—YAMTL batch 


1048576 
262144 
65536 
16384 
4096 
1024 
256 

64 

16 


TIME (MS) 


1 4 16 64 256 1024 4096 16384 
MODEL SIZE FACTOR 


TIME (MS) 
aog 


1 4 16 64 256 1024 4096 16384 
MODEL SIZE FACTOR 


Fig. 3. Performance of initialization (top) and delta propagation (bottom). 
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Figure 3 shows the performance results obtained both for the initial model 
transformation and for forward delta propagation for the models generated for 
the client-server scenario. Scales both for time (ms.) along Y axis and for model 
size factors along X axis are logarithmic allowing us to compare the scalability 
of the different approaches. In the initialization phase, we have included the 
execution of YAMTL in batch mode (YAMTL-batch) over the source model, 
and it can be seen that tracking dependencies incurs a small penalty. However, 
the other two solutions (EXPL and QRT) operate several orders of magnitude 
slower. In the propagation phase, it can be observed that while YAMTL-incr 
exhibits a constant propagation time (in ys.) for the source delta, the cost of 
the other solutions depends on the size of the input model. Furthermore, for 
the other incremental approaches, when both initial and propagation time are 
combined their performance worsens due to their costly initialization phase. 


5 Related Work 


In this section, we discuss techniques used in related work for achieving incre- 
mentality in both reactive and bidirectional model transformation. 

Reactive model transformation [3,21] enable the propagation of model 
updates from source models to target models on demand. State-of-the-art tool 
support relies on notification mechanisms, enabling live detection of source model 
updates either for immediate processing, as in VIATRA [3], or for deferred pro- 
cessing, as in ReactiveATL [21]. In these approaches, source model update notifi- 
cations are usually fine-grained and kept in memory. Such notifications can only 
be detected when the transformation engine is in memory (live) as well. The use 
of a notification mechanism means that models are loosely coupled to the trans- 
formation engine. Working with offline model updates, as in the proposed delta 
propagation procedure, completely decouples detection of deltas from the trans- 
formation engine, freeing model update developers from the overhead of hav- 
ing the transformation infrastructure in memory. The latter is only needed for 
propagating changes but not for defining them. In reactive approaches, when an 
observer receives an update notification, information about the intent of the over- 
all model delta, i.e. the contextual information relating different atomic updates, 
is lost. This problem is avoided using documented deltas, which may be serial- 
ized, enabling their processing — e.g. aggregating composite changes like the 
move operation — and optimization — reduction of atomic operations that are 
cancelled when composed. We refer the reader to [9] for an additional discussion 
of delta-based model updates against state-based model updates. 

Among bidirectional model transformation approaches, Triple Graph Gram- 
mars (TGG), introduced in [22], are a declarative approach for specifying bidi- 
rectional consistency relations between models. Although our approach is not 
bidirectional, it is worth comparing how incrementality is supported in opera- 
tional TGG rules. Incrementality was first introduced in TGG synchronization 
in [11,12]. Efficient approaches for TGG synchronization [18-20] avoid analyzing 
the whole model by relying on dependencies which hint at the impact of a model 
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update directly. Precedence-based approaches [18,20] keep a binary precedence 
relation over the set of model elements in order to determine when creation or 
deletion of a model element affects another one. While [18] overestimates the 
actual dependencies by defining them at the type level, others underestimate 
them relying on user feedback [20] or on special correspondences [12]. [19] decou- 
ples impact analysis of model updates from consistency restoration by delegat- 
ing the former to VIATRA’s incremental pattern matcher, which has a built-in 
dependency tracker, and by defining operational rules using a reactive model 
transformation approach. However, these two phases are still tightly coupled 
using a synchronous communication mechanism between the incremental pattern 
matcher and the synchronization procedure since the pattern matcher may trig- 
ger revocations/applications of forward marking rules after revoking/applying 
one of them. That is, the model synchronization procedure uses the pattern 
matcher to know when synchronization terminates. In the delta propagation 
mechanism proposed in the present work, either the revocation of applied trans- 
formation steps or the creation of new transformation steps cannot trigger fur- 
ther applications because rule matches are computed against the source model 
and they are unique, that is the same match cannot enable two different rules. 
A new transformation step may be found when new elements are inserted in the 
source model. On the other hand, when a transformation step is revoked, no 
other rule can be applied or a conflict would have been detected when the rule 
was applied the first time. 

Some transformation engines with support for bidirectional transformations, 
like NMF [14,15], support the offline representation of model deltas. However, 
to the best of our knowledge, none of the aforementioned approaches uses a 
standardized notation for them, such as the EMF model change, which can be 
regarded as the de-facto standard for representing model deltas in the EMF 
modeling tool ecosystem. 


6 Concluding Remarks 


The main contribution of this work is the design of a delta propagation procedure 
for executing delta-driven model transformations, which has been implemented 
in YAMTL. The novelty of the approach consists in the use of a standard- 
ized representation of model deltas, which facilitates interoperability with EMF- 
compliant tools, and in the use of dependency injection mechanism, which allows 
the transformation engine to be aware of model updates without having to rely 
on a publish-subscribe infrastructure. The VIATRA CPS benchmark has been 
used to justify that (1) the initialization transformation in YAMTL is several 
orders of magnitude faster than the up-to-now fastest incremental solutions and 
that (2) propagation of sparse deltas can be performed in real time for VLMs, 
independently of their size, whereas other solutions show a clear dependence on 
their size. Hence, YAMTL shows satisfactory scalability in incremental execu- 
tion of model transformations on VLMs. Additional studies with larger classes 
of models will be considered in future work. 
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Abstract. Graph repair, restoring consistency of a graph, plays a promi- 
nent role in several areas of computer science and beyond: For example, 
in model-driven engineering, the abstract syntax of models is usually 
encoded using graphs. Flexible edit operations temporarily create incon- 
sistent graphs not representing a valid model, thus requiring graph repair. 
Similarly, in graph databases—managing the storage and manipulation 
of graph data—updates may cause that a given database does not satisfy 
some integrity constraints, requiring also graph repair. 

We present a logic-based incremental approach to graph repair, gen- 
erating a sound and complete (upon termination) overview of least- 
changing repairs. In our context, we formalize consistency by so-called 
graph conditions being equivalent to first-order logic on graphs. We 
present two kind of repair algorithms: State-based repair restores consis- 
tency independent of the graph update history, whereas delta-based (or 
incremental) repair takes this history explicitly into account. Technically, 
our algorithms rely on an existing model generation algorithm for graph 
conditions implemented in AUTOGRAPH. Moreover, the delta-based app- 
roach uses the new concept of satisfaction (ST) trees for encoding if and 
how a graph satisfies a graph condition. We then demonstrate how to 
manipulate these STs incrementally with respect to a graph update. 


1 Introduction 


Graph repair, restoring consistency of a graph, plays a prominent role in several 
areas of computer science and beyond. For example, in model-driven engineering, 
models are typically represented using graphs and the use of flexible edit opera- 
tions may temporarily create inconsistent graphs not representing a valid model, 
thus requiring graph repair. This includes the situation where different views of 
an artifact are represented by a different model, i.e., the artifact is described by a 
multi-model, see, e.g. [6], and updates in some models may cause a global incon- 
sistency in the multimodel. Similarly, in graph databases—managing the storage 
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and manipulation of graph data—updates may cause that a given database does 
not satisfy some integrity constraints [1], requiring also graph repair. 

Numerous approaches on model inconsistency and repair (see [12] for an 
excellent recent survey) operate in varying frameworks with diverse assumptions. 
In our framework, we consider a typed directed graph (cf. [7]) to be inconsistent 
if it does not satisfy a given finite set of constraints, which are expressed by 
graph conditions [8], a formalism with the expressive power of first-order logic 
on graphs. A graph repair is, then, a description of an update that, if applied 
to the given graph, makes it consistent. Our algorithms do not just provide 
one repair, but a set of them from which the user must select the right repair 
to be applied. Moreover, we derive only least changing repairs, which do not 
include other smaller viable repairs. Our approach uses techniques (and the tool 
AUTOGRAPH) [17] designed for model generation of graph conditions. 

We consider two scenarios: In the first one, the aim is to repair a given graph 
(state-based repair). In the second one, a consistent graph is given together with 
an update that may make it inconsistent. In this case, the aim is to repair the 
graph in an incremental way (delta-based repair). 

The main contributions of the paper are the following ones: 


— A precise definition of what an update is, together with the definition of some 
properties, like e.g. least changing, that a repair update may satisfy. 

— Two kind of graph repair algorithms: state-based and incremental (for the 
delta-based case). Moreover, we demonstrate for all algorithms soundness 
(the repair result provided by the algorithms is consistent) and completeness 
(upon termination, our algorithms will find all possible desired repairs). 

Summarizing, most repair techniques do not provide guarantees for the func- 

tional semantics of the repair and suffer from lack of information for the deploy- 

ment of the techniques (see conclusion of the survey [12]). With our logic-based 
graph repair approach we aim at alleviating this weakness by presenting formally 
its functional semantics and describing the details of the underlying algorithms. 

The paper is organized as follows: After introducing preliminaries in Sect. 2, 
we proceed in Sect. 3 with defining graph updates and repairs. In Sect. 4, we 
present the state-based scenario. We continue with introducing satisfaction trees 

in Sect. 5 that are needed for the delta-based scenario in Sect. 6. We close with a 

comparison with related work in Sect.7 and conclusion with outlook in Sect. 8. 

For proofs of theorems and example details we refer to our technical report [18]. 


2 Preliminaries on Graph Conditions 


We recall graph conditions (GCs), defined here over typed directed graphs, used 
for representing properties on such graphs. In our running example”, we employ 


1 Note that completeness implies totality (if the given set of constraints is satisfiable 
by a finite graph, then the algorithms will find a repair for any inconsistent graph). 

? We refer to Sect. 1 with pointers to related work including diverse use cases in Soft- 
ware Engineering for graph repair with more complex and motivating examples. 
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E: 
By CAB —~3(a,=(3(a—— b, true) A —3(a De, true))) 


Fig. 1. The type graph TG (left) and the GC w (right) for our running example 


the type graph TG from Fig. 1 and we use nodes with names a; and b; to indicate 
that they are of type :A and :B, respectively. 

GCs state facts about the existence of graph patterns in a given graph, called 
a host graph. For example, in the syntax used in our running example, the GC 
(a, true) means that the host graph must include a node of type :A. Also, 
(a—b, true) means that the host graph must include a node of type :A, 
another node of type :B, and an edge from the :A-node to the :B-node. 

In general, in the syntax that we use in our running example, an atomic 
GC is of the form 4(H,¢) (or 7=5(H,¢)) where H is a graph that must be (or 
must not be) included in the host graph and where ¢ is a condition expressing 
more restrictions on how this graph is found (or not found) in the host graph. 
For instance, 3(a, =3(a—®“—>b, true)) states that the host graph must include 
an :A-node such that it has no outgoing edge e to a :B-node. Moreover, we use 
the standard boolean operators to combine atomic GCs to form more complex 
ones. For instance, 4(a,7(A(a—&+b, true) A ~A(a~e, true))) states that the 
host graph must include an :A-node, such that it does not hold that there is 
an outgoing edge e to a :B-node and node a has no loop. In addition, as an 
abbreviation for readability, we may use the universal quantifier with the mean- 
ing V(H, ¢) = 75(H,-7¢). In this sense, the condition ¢ from Fig. 1, used in our 
running example, states that every node of type :A must have an outgoing edge 
to a node of type :B and that such an :A-node must have no loop. 

Formally, the syntax of GCs [8], expressively equivalent to first-order logic on 
graphs [5], is given subsequently. This logic encodes properties of graph exten- 
sions, which must be explicitly mentioned as graph inclusions. For instance, the 
GC A(a,7d(a—&+b, true)) in simplified notation is formally given in the syn- 
tax of GCs as A(in, =3(a — (a—=+bd), true)), where ig denotes the inclusion 
Ø — H with H the graph consisting of node a. This is because it expresses a 
property of the extension izz. Moreover, therein the GC —~3(a — (a—®— b), true) 
is actually a property of the extension a — (a—+b). 


Definition 1 (Graph Conditions (GCs) [8]). The class of graph condi- 
tions PGC for the graph H is defined inductively: 


- AS € PSC if S Can PSO. 
- ag E BF if pe BSS. 
- Ha: H — H', p) € PSE if ọ € PGS. 


In addition true, false, VS, 6, = ¢2, and V(a,¢) can be used as abbreviations, 
with their obvious replacement. 

A mono m : H —> G satisfies a GC y € PGC, written m Eac Y, if one of 
the following cases applies. 
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- Y = AS and m =cc ¢ for each ọ E€ S. 


- p =7¢ and not m Fac ¢. 
- y = Jla : H — H',¢) and 3q : H' — G. qoa = m ^q Hac ¢. 


A graph G satisfies a GC 4 € poo, written GEac Y or G € Iv], ifie Fac Y. 


3 Graph Updates and Repairs 


In this section, we define graph updates to formalize arbitrary modifications of 
graphs, graph repairs as the desired graph updates resulting in repaired graphs, 
as well as further desireable properties of graph updates. 

In particular, it is well known that a modification or update of Gi resulting 
in a graph G2 can be represented by two inclusions or, in general two monos, 
which we denote by (1: I — G1,r : I > G2), where I represents the part of Gi 
that is preserved by this update. Intuitively, l : I —> G describes the deletion 
of elements from Gj (i.e., all elements in G1 \ (I) are deleted) and r : I => G2 
describes the addition of elements to I to obtain Go (i.e., all elements in G \r (T) 
are added). 


Definition 2 (Graph Update). A (graph) update u is a pair (l: I = Gi,r: 
I — G2) of monos. The class of all updates is denoted by U. 


Graph updates such as (ig : Ø — G,ig : Ø —> G) where G is not the empty 
graph delete all the elements in G that are added by r afterwards. To rule out 
such updates, we define an update (l : I — Gj,r : I — Go) to be canonical 
when the graph I is as large as possible, i.e. intuitively J = G1 N G2. Formally: 


Definition 3 (Canonical Graph Update). If (l: I = Gi,r: I< G2) EU 
and every (l : ' => Gyr’: I! G2) E U and mono i: I > T' with’oi=l 
and r'o i= r satisfies that i is an isomorphism then (l,r) is canonical, written 


(lL r) € Ucan- 


Gi ~ i > I ad Go 
A I' r! 


An update wu, is a sub-update (see [14]) of u whenever the modifications defined 
by wu, are fully contained in the modifications defined by u. Intuitively, this is the 
case when u; can be composed with another update wz such that (a) the resulting 
update has the same effect as u and (b) uz does not delete any element that was 
added before by u1. This is stated, informally speaking, by requiring that I is 
the intersection (pullback) of J, and Jy and that G% is its union (pushout). 


Definition 4 (Sub-update [14]). [fu = (l: Io Gi,r: Io G2) €U, 
uy = (h : h => Gir : l — G3) €U, ug = (lə : I2 > G3,r2 : Ig —> Go) E U, 
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(7, : L015: 1 h) is the pullback of (r1,l2), and (r1,l2) is the pushout of 
(71,15) then u1 is a sub-update of u, written uy <”? u or simply u < u. 


l ri ly T2 


Gi ~ > 1, $ > Go < > [o < > G3 


Moreover, we write uy <“2 u or uy, < u when uy <“? u and not u < uy. 


We now define graph repairs as graph updates where the result graph satisfies 
the given consistency constraint y. 


Definition 5 (Graph Repair). If u = (l: Io Gi,r: I G2) E U, w 
DFS, and G2 Fac w then u is a graph repair or simply repair of Gi with respect 
to w, written u E U(Gi, Y). 


To define a finite set of desirable repairs, we introduce the notion of least chang- 
ing repairs that are repairs for which no sub-updates exist that are also repairs. 


Definition 6 (Least Changing Graph Repair). Ify € BRC, u=(l:I—> 
Gi,r: I > G2) E U(G1, Y), and there is now E€ U(G1, Y) such that u’ < u then 
u is a least changing graph repair of G, with respect to p, written u € Uic(G1, Y). 


Note that every least changing repair is canonical according to this definition. 
Moreover, the notion of least changing repairs is unrelated to other notions of 
repairs such as the set of all repairs that require a smallest amount of atomic 
modifications of the graph at hand to result in a graph satisfying the consistency 
constraint. For instance, a repair u; adding two nodes of type :A may be a least 
changing repair even if there is a repair ug adding only one node of type :B. 

A graph repair algorithm is stable [12], if the repair procedure returns the 
identity update (idg : Œ — G,idg : G — G) when graph G is already consistent. 
Obviously, a graph repair algorithm that only returns least changing repairs is 
stable, since the identity update is a sub-update of any other repair. 


4 State-Based Repair 


In this section, we introduce two state-based graph repair algorithms (see [18] 
for additional technical detail), which compute a set of graph repairs restoring 
consistency for a given graph. 


Definition 7 (State-Based Graph Repair Algorithm). A state-based 
graph repair algorithm takes a graph G and a GC 4 € BEC as inputs and returns 
a set of graph repairs in U(G, p). 


Note that the tool AUTOGRAPH [17] can be used to verify this condition as 
follows: It determines the operation A that constructs a finite set of all minimal 
graphs satisfying a given GC w. Formally, A(w) =N{S C [y] | VG’ € ly]. 3G € 
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S.dm:G — G’.true}. While AUTOGRAPH may not terminate when comput- 
ing this operation due to the inherent expressiveness of GCs, it is known that 
AUTOGRAPH terminates whenever 7~ is not satisfied by any graph. 

The state-based algorithm Repair,,,1 uses A to obtain repairs. Repairsp,1 
computes the set A(~ A dig, true)) that contains all minimal graphs that (a) 
satisfy w and (b) include a copy of G. All these extensions of G correspond 
to a graph repair. For our running example, we do not obtain any repair for 
graph Gi, from Fig.2 and GC w from Fig. 1 because the loop on node az would 
invalidate any graph including G‘,. We state that Repairs,,1 indeed computes 
the non-deleting least changing graph repairs. 


Theorem 1 (Functional Semantics of Repairs,,1). Repairsp,1 is sound, i.e., 
Repairs», (G, Y) C Uie(G, Y), and complete (upon termination) with respect to 
non-deleting repairs in Ui.(G, 4%). 


The second state-based algorithm Repairs,,2 computes all least changing graph 
repairs. In this algorithm we use the approach of Repairsb,ı but compute A(Y A 
A(ig,, true)) whenever an inclusion | : Ge — G describes how G can be restricted 
to one of its subgraphs Ge. Every graph G” obtained from the application of A 
for one of these graphs Ge then results in one graph repair returned by Repairsp,2 
except for those that are not least changing. 

To this extent we introduce the notion of a restriction tree (see example in 
Fig.2) having all subgraphs Ge of a given graph G as nodes as long as they 
include the graph Gmin, which is the empty graph in the state-based algorithm 
Repairsh,2 but not in the algorithm Repairg, in Sect.6, and where edges are 
given in this tree by inclusions that add precisely one node or edge. 


Definition 8 (Restriction Tree RT). IfG and Gmin are graphs and S = {1 : 
Ge => Gp | Gmin C Ge C Gp C Gl is an inclusion}, S” is the least subset of S 
such that the closure of S under o equals S then a restriction tree RT(G, Gmin) 
is a least subset of S such that for all two inclusions lı : G —> G, E€ SF and 
l2 : G— Go E S’ one of them is in RT(G, Gmin). 


Considering our running example, the restriction tree in Fig.2 is traversed 
entirely except for the four graphs without a border, which are not traversed 
as they have the supergraph marked 9 satisfying q and therefore traversing 
those would generate repairs that are not least changing. The resulting graph 
repairs for the condition w are given by the graphs marked by 3-6. 

Our second state-based graph repair algorithm is indeed sound and complete 
whenever the calls to AUTOGRAPH using A terminate. 


Theorem 2 (Functional Semantics of Repairsp,2). Repairsp,2 is sound, i.e., 
Repairsp,2(G, Y) CUc(G,w), and complete, i.e., Uic(G, Y) C Repairsy2(G, Y), 


upon termination. 


5 Satisfaction Trees 


The state-based algorithms introduced in the previous section are inefficient 
when used in a scenario where a graph needs repair after a sequence of updates 
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Fig. 2. The restriction tree RT(G4, Ø) (enclosed by the polygon) and four graph repairs 
(marked 3-6) generated using Repairsp,2 


that all need repair. We thus present in Sect. 6 an incremental algorithm reducing 
the computational cost for a repair when an update is provided. This algorithm 
uses an additional data structure, called satisfaction tree or ST, which stores 
information on if and how a graph G satisfies a GC w (according to Definition 1). 
In this section, given y and G, we define how such an ST y is constructed and 
how it is updated once the graph G is updated. 

If 4 is a conjunction of conditions, its associated ST y is a conjunction of STs 
and if w is a negation of a conditions, its associated y is a negation of an ST. In 
the case when w is a 3(a : H — H’, 9), recall that a match m : H — G satisfies 
w if there exists a q : H’ — G such that m = qoa and q Eac ¢. For this case, we 
keep in ST each q satisfying these two conditions and also each q that satisfies 
the first condition, but not the second. More precisely, for the case of existential 
quantification, the corresponding ST is of the form 3(a : H — H’, ġ, mı, mp), 
where m; and my are partial mappings (we use sup( f) to denoted the elements 
actually mapped by a partial map f) that map matches q : H’ —> G that satisfy 
m = qoa (for a previously known m : H — G) to an ST for the subcondition 
ġ. The difference between both partial functions is that m, maps matches q to 
STs for which q Fac ¢ while mf maps matches q to STs for which q ac ¢. 
Consider Fig. 3b for an example of an ST yu. 

The following definition describes the syntax of STs. The STs are defined 
over matches into a graph G to allow for the basic well-formedness condition 
that every mapped match q satisfies goa = m. 


Definition 9 (Satisfaction Trees (STs)). The class of all Satisfaction Trees 
TST for a mono m : H — G contains y if one of the following cases applies. 


- y = AS and S Cfn TST, 

- y =y and x E€ TST. 

- y = J(a,ġ, mms), a: H > H', 6 € BSS, m, ms Can {(q: H! > G,7) | 
qoa=m ye r$T}, and m4, mp are partial maps. 
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l r 
€I ez u y €i c u e1 
a, —> bı «— a2 ai —> bı a2 a, — bı a2 e3 


Ge lu Gu 


(a) A graph update u = (lu : Iu > Gu, ru : Iu > Gi) 


Yu = —J(a,=(3(a— >b, true) (ae, true)), 0, {a2 > Yu, a1 Œ Yu,2}) 


Yur = 7(3(a—&+ b, true, {a2 >bı > true}, 0) A —I(a e, true, Ø, 0)) 


Yu,2 = 7(3(a—&> b, true, {a +b, > true}, 0) A =I (ae, true, 0, 0)) 


(b) The ST yu for Gu (see Fig. 3a) and w (see Fig. 1). 


yi = 75(a, =(3(a——> b, true)A73(a e, true)), {a2 => Yh}, {a1 > 42} 


Yad = 7=(3(a—*+ d, true, 0,0) A —3(a De, true, {a2 Des + true}, Ø)) 


yi2 = 7(a(a—2+ b, true, {a =b; + true}, 0) A ~3(a e, true, 0, 0)) 


(c) The ST 74} for Iu (see Fig. 3a) and ~ (see Fig. 1) that is obtained as the backward 
propagation ppgB(‘u, lu) from Yu (see Fig. 3b) and lu (see Fig. 3a) 


Ya = 75a, -(A(a—2+ b, true) (a De, true)), {a2 RY Yah {a1 > a2} 


Ya. = 7(4(a—"+d, true, Ore), 0) A ~Il ae, true, {a2 Dea = true}, 0)) 


a2 = (I(a—— b, true, {a b; +> true}, Ø) A =3(a e, true, Ø, 0)) 


(d) The ST %, for Gi, (see Fig. 3a) and w (see Fig. 1) that is obtained as the 
forward propagation ppgF (y4, ru) from 4 (see Fig. 3b) and ru (see Fig. 3a). Also 


Wa is the result of ppgU (yu, u) that applies backward and forward propagation. The 
viable points for the delta-based repair discussed in Sec. 6 are indicated by (R1)—(R3). 


Fig. 3. A graph update and an ST with its propagation over the graph update where 
GCs are underlined in STs for readability 


The following satisfaction predicate Fac for STs defines when an ST y for 
a mono m states that the contained GC w is satisfied by the morphism m. 


Definition 10 (ST Satisfaction). An ST y € TST pesg is satisfied, written 
Esr y, if one of the following cases applies. 


- y = AS and sr x (for each x € S) 
- y= >X and Esr x. 
- y= 3(a,ġ, m, mp) and m; F O. 


The following recursive operation constructs an ST y for a graph G and a con- 
dition ~ so that y represents how G satisfies (or not satisfies) ~. Note that the 
match m in the definition of STs above and the construction of an ST below 
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corresponds to the match m : H — G from Definition 1 that we operationalize 
in the following definition. For conjunction and negation, we construct the STs 
from the STs for the subconditions. For the case of existential quantification, 
we consider all morphisms q : H’ — G for which the triangle go a = m com- 
mutes and construct the STs for the subcondition ¢ under this extended match 
q. The resulting STs are inserted into m; and my according to whether they are 
satisfied. 


Definition 11 (Construct ST (cst)). Given m: H —> G and wv € PSE, we 
define cst(,m) = y, with y E€ TST as follows. 


- Ify = ^S then y = A{est(d,m) | 9 E€ S}. 
- Ify = nọ then y = 7cst(¢,m). 
= Ify = Jla : H — H',¢), Mall = {(q : H' 3 G, x) | qoa = m, cst(ġ,q) =x}, 


mM = {(q, x) E Mall Est xh, Mf = Mall \ Mt, then Pia A(a, Q, mmp). 
If G is a graph and € ®F°, then cst(Y, G) = cst (Y, ia). 


This construction of STs then ensures that sr y if and only if G Eac Y. Note 
that Fgr Yu holds for the ST yu from Fig. 3b, the GC 4% from Fig. 1, and the 
graph Gu from Fig. 3. 


Theorem 3 (Sound Construction of STs). Given m: H —> G, y € PGC, 
and cst(w,m) = y then Esr y iff m Eac Y. 


Subsequently, we define a propagation operation ppgU of an ST y for a graph 
update u = (1: I = G,r : I — G’) to obtain an ST y’ such that y = 
cst(w,G’) whenever y = cst(w,G). This overall propagation is performed by a 
backward propagation of y for l using the operation ppgB followed by a forward 
propagation of the resulting ST for r using the operation ppgF. 

For backward propagation, we describe how the deletion of elements in G by 
l: I — G affect its associated ST y. To this end, we preserve those matches 
q : H — G for which no matched elements are deleted. This is formalized by 
requiring a mono q’ : H —> I such that lo q’ = q. The matches q with deleted 
matched elements can not be preserved and are therefore removed. 


Definition 12 (Propagate Match (ppgMatch)). Ifq:H —> G andl:I GG 
are monos, then ppgMatch(gq,1) is the unique q' : H — I such that lod’ = q if 
it exists and L otherwise. 


The following recursive backward propagation defines how deletions affect the 
maps m, and my of the given ST. That is, when y = 3(a,ġ, m+, mp), we (a) 
entirely remove a mapping (m,x) from m; or my if ppgMatch(q,!) = L and 
(b) construct for a mapping (m, x) from m; or my the pair (ppgMatch(q, !), x’) 
where x’ is obtained from recursively applying the backward propagation on 
x when ppgMatch(q,/) # L. The updated pair (ppgMatch(q, 1), x’) must be 
rechecked to decide to which partial map this pair must be added to ensure that 
the resulting ST corresponds to the ST that would be constructed for G’ directly. 
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Definition 13 (Backward Propagation (ppgB)). If m: H => G, y € TST, 
l: I G, ppgMatch(m, 1) = m : HI, andy E TST then ppgB(y7,!) = 7 if 
one of the following cases applies. 


- y=AS and y = A{ppgB(x,!) | x € S}. 

- y = >x and y' = =~ ppgB(x, l). 

-y = J(a, bp, m, Mf), Mau = {(4',x’) | (Gx) E m U mf A ppgMatch(q, 1) = 
q # L AppgB(x l) = x}, m = {(4,X) E€ Mau |Fsr X}, Mp = Mau \ mM, 
and Y = 3(a, $, m,, mr). 


Note that ppgMatch(iq,l) = iq and, hence, the operation ppgB is applicable 
for all ST y € Eee, which is sufficient as we define consistency constraints using 
GCs over the empty graph as well. 

In the case of forward propagation where additions are given by r : I = G’ 
we can preserve all matches using an adaptation. But the addition of further 
elements may result in additional matches as well that may satisfy the conditions 
to be included in the corresponding m, and my from the ST at hand. 


Definition 14 (Forward Propagation (ppgF)). Ify «Ts ,.,,,r:1oG, 

and y! ETSIT, then ppgF(y,r) = 7’ if one of the following cases applies. 

- y= ANS and Y = A{ppgF(x,7) | x € 9}. 

- Y= x and Y = ~ppgF(x,7). 

FS J(a, Q, Mt, ms), Mall = {(roq, 7’) | (q, xX) € m,U me AppgF (x, r) = y}U 
{(4,7q) |goa=rom, Aq’ € sup(m) U sup(ms). ro q' = q), est(, q) = Yq} 


m, = {(4, x) E Mall IEsr xb mis = Mall x mM; and y z Jla, b, m, m'p). 


We now define the composition of both propagations to obtain the operation 
ppgU that updates an ST for an entire graph update. 


Definition 15 (Update Propagation (ppgU)). If m: H = G, ye rT$T, 1: 
I — G, ppgMatch(m, l) = m : H — G’, andr: I — G’ then ppgU(y, (lL, r)) = 
ppgF (ppgB(7, 1), r) € Tir. 


The overall propagation given by this operation is incremental, in the sense that 
the operation cst is only used in the forward propagation on parts of the graph 
G’, where the addition of graph elements by r from the graph update results in 
additional matches q according to the satisfaction relation for GCs. Finally, we 
state that ppgU incrementally computes the ST obtained using cst. The proof of 
this theorem relies on the fact that this property also holds for ppgB and ppgF. 


Theorem 4 (ppgU is Compatible with cst). If G is a graph,  € oF, 
1: IG, andr: IG’ then ppgUu(cst(w, G), (l, r)) = est(w, G’). 


6 Delta-Based Repair 


The local states of delta-based graph repair algorithms may contain, besides the 
current graph as in state-based graph repair algorithms, an additional value. In 
our delta-based graph repair algorithm this will be an ST. 
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Fig. 4. An example for delta-based graph repair using Repairgp 


Definition 16 (Delta-Based Graph Repair Algorithm). Delta-based 
graph repair algorithms take a graph G, a GC we dpe, and a value q as inputs 
and return a set of pairs (u,q’) where u E€ U(G, Y) is a graph repair and q is a 
value. 


Our delta-based graph repair algorithm Repairgp will be based on the single step 
operation Repairgp;. Given a graph G, a GC 4 € OF”, the ST y that equals 
cst(w,G), and a graph update u = (l: I = G,r : I — G’), the single step 
operation Repairap first updates y using ppgU for the graph update u and then 
determines using Repaira»ı, if necessary, graph repairs for the resulting ST y’ 
according to the repair rules described in the following. The algorithm Repairap 
then uses Repairgp; in a breadth first manner to obtain multi-step repairs. 

For our example from Fig.3a, such a multi-step repair of G/, is given in 
Fig. 4 where the graph updates are obtained resulting in the graphs marked 1-3, 
of which only the graph marked 1 satisfies =. The algorithm Repairg, then com- 
putes further graph updates resulting in the graph marked 4 also satisfying w. 

The operation Repairapı for deriving single-step repairs depends on two local 
modifications. Firstly, a GC 3(a : H — H’,¢) occurring as a subcondition in 
the consistency constraint ~ may be violated because, for the match m : H — 
G that locates a copy of H in the graph G under repair, no suitable match 
q : H' — G can be found for which goa = m and q cc ¢ are satisfied. 
The operation Repairada resolves this violation by (a) using AUTOGRAPH to 
construct a suitable graph H, and by (b) integrating this graph H, into G 
resulting in G” such that a suitable match q : H’ — G” can be found. 


Definition 17 (Local Addition Operation Repairaqa). Ifa: H — H’, d€ 
SF, m: H >G, H, € Alin, Ila, ¢))), k: HO H,, and (m:H, Gyr: 
G — G') is the pushout of (m, k) then r € Repairaqala, ġ, m). 


ge og H, 


m} , 1 m 


Ga 


In our running example, Repairaqq determines a graph repair resulting in the 
graph marked 2 in Fig. 4. For this repair, we considered the sub-ST marked by 
(R2) in Fig. 3d, where the morphism m matches the node a from w to the node 
az in Gl, but where no extension of m can also match a node :B and an edge 
between these two nodes. The repair performed then uses a—&+b for the graph 
H,, resulting in the addition of the node bz and the edge from az to bo. 
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Secondly, a GC 3(a : H — H’,¢) occurring as a subcondition in the consis- 
tency constraint ~ may be satisfied even though it should not when occurring 
underneath some negation. Such a violation is determined, again for a given 
match m : H — G, by some match q : H’ — G satisfying qoa = m and 
q =cc ¢. The local repair operation Repairge repairs such an undesired satis- 
faction by selecting a graph H, such that H C H, C H’ using a restriction tree 
(see Definition 8) and deleting Gaei = q(H’) \ q(Hp) from G. Technically, we can 
not use the pushout complement of a’ and q as it does not exists when edges 
from G \ Gae are attached to nodes in Gaei. Hence, we determine the pushout 
complement of a” and k’, which must be constructed for this purpose suitably. 


Definition 18 (Local Deletion Operation Repairae). Ifa: H => H', q: 
H' > G,a: Hp @ H’ € RT(H', H), mı : H' —> Xə where Xə is obtained 
from q(H') by adding all edges (with their nodes) that are connected to nodes in 
q(H") \ q(a'(H,)), K : X2 — G is obtained such that k' om, = q, mz: Hp > Xı 
where X, is obtained from H, by adding all nodes in Xə \q(H'), a” : Xı > X2 
is obtained such that a” o ma = mı oa', and (l: G’ 3 G, m : Xı © G’) is the 
pushout complement of (a”,k') then l © Repairaei(a, q). 


H&H > 
n" yma 


Xo «— Xı 


| k' ; | m 


G+—@G@ 


In our example, Repairg.; determines a repair resulting in the graph marked 1 
in Fig.4. For this repair, we considered the sub-ST marked by (R1) in Fig. 3d 
where the mono m matches the node a from w to the node az in G/,. The 
repair performed then uses H, = @ for the removal of the node az along with its 
adjacent loop (for which the technical handling in Repairge: is required). 

The recursive operation Repairgp; below derives updates from an ST y that 
corresponds to the current graph G (for our running example, these are y'u 
and G/, from Fig. 3d). In the algorithm Repairg,, we apply Repairapı for the 
initial match ig, y, and true where this boolean indicates that we want y to be 
satisfied. This boolean is changed in Rule 3 whenever the recursion is applied 
to an ST ~y because we expect that y’ is not to be satisfied iff we expect that 
~y’ is to be satisfied. For conjunction, we either attempt to repair a sub-ST 
for b = true in Rule 1 or we attempt to break one sub-ST for b = false. For 
existential quantification and b = true, we use Repairadd as discussed before in 
Rule 4 or we attempt to repair one existing match contained in my in Rule 5. 
Also, for existential quantification and b = false, we use Repairge as discussed 
before in Rule 6 or we attempt to break one existing match contained in mẹ in 
Rule 7. 


Definition 19 (Single-Step Delta-Based Repair Algorithm Repairgp1). 
Ifm: Ho G, y e TST, andb e B then (l: IO Gyr:IoG)e 


m ? 


Repairapı(M, y, b) if one of the following cases applies. 
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- Rule 1 (repair one subcondition of a conjunction): 
b= true,y = ^S, x € S, Esr x, (l, r) € Repairapı (m, x, b). 
- Rule 2 (break one subcondition of a conjunction): 
b = false,y = AS, x € S, Est x, (Lr) E€ Repairayi(m, x, b). 
- Rule 3 (repair/break the subcondition of a negation): 
y= =x, (l, r) € Repairapi(m, X, =b). 
- Rule 4 (repair an existential quantification by local extension): 
b= true,y = 3(a, d, m, mz), Mmi = Q, r E€ Repairaaa(a, $, m), | = ida. 
— Rule 5 (repair an existential quantification recursively): 
b = true,y = A(a, p, mz, mz), Me = 0, me(k) = x, (1,7) € Repairayi(k, x, b). 
- Rule 6 (break an existential quantification by local removal): 
b = false, y = 3(a, ġ, m, my), me(k) Æ L, lL E€ Repairae(a, k), r = ida. 
- Rule 7 (break an existential quantification recursively): 


b= false ,y = Aa, Q, Mt, mf), mlk) =X; (Lr) € Repairapi(k, x, b). 


We define the recursive algorithm Repairg, to apply Repairgp; to obtain repairs 
as iterated applications of single-step repairs computed by Repairgp1. 


Definition 20 (Delta-Based Repair Algorithm Repairg, ). [fu = (l: Io 
G,r:I= @) EU, yE rT, and y! = ppgU(y,u) then Repairap lu, y) = S if 
one of the following cases applies. 


- Esr y and S = {( (ide, ide’), y')}- 

- sr Y, S = {(u’, ppgu(7’, u’)) | u’ € Repairanı (ic, 7’, true)}, and 
S={(u, VY) E S |Fsr Y uU” ou’, 7”) | (w, y) € S, Esr y, (us 7”) € 
Repairay(u’,7’),u" ou’ A L}.3 


This computation does not terminate when repairs trigger each other ad 
infinitum. However, a breadth-first-computation of Repairgp gradually computes 
a set of sound repairs. Obviously, GCs that trigger such nonterminating compu- 
tations should be avoided but machinery for detecting such GCs is called for. 

Note that the algorithm Repairg, computes fewer graph repairs compared to 
Repairsh,2 because repairs are applied locally in the scope defined by the GC y. 
For example, no repair would be constructed resulting in the graph marked 4 
in Fig. 2. In general, explicitly also using bigger contexts in 7% results in the 
additional computation of less—local graph repairs. For example, the condition 
a may be rephrased into a’ = wA7H(a b,754(a—*+b, true)) to also obtain the 
graph repair marked 4 in Fig.2. We now define the updates, which we expect 
to be computed by Repairgpi, as those that repair a single violation of the GC 
w by defining a local update to be embeddable into the resulting update via a 
double pushout diagram as in the DPO approach to graph transformation [16]. 


Definition 21 (Locally Least Changing Graph Update). If Gi is a graph, 
Y E OFS, Gi Koc y, (l: I => Gyr: I > G2) €Uc(Gi,Y), G2 Fac Y, Xi is 


a minimal subgraph of G, with a violation of w that is also a violation of w in 


3 If u} and uz are updates then u1 o u2 = u if uy <“? u or u = L otherwise (see 
Definition 4). 
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G, and the diagram below exists and the right part of it is a DPO diagram then 
(l,r) is a locally least changing graph update. 
X1 I! + X 
rots 


Gi- Io Go 


Repairap1 indeed generates such locally least changing graph updates because 
the graph Xj, in this definition corresponds to the Hı and the Hə from an 
ST 3(a : Hı — H2,, m4, mp) that is subject to Repairaga and Repairaci, 
respectively. For example, for Repairaqa, the graph Hı in the ST determines a 
subgraph in G that is a violation of the overall consistency condition given by 
a GC wy as its match can not be extended to the graph Ho. 

We now define the locally least changing graph repairs (which are to be 
computed by Repairgp such as for example the graphs marked 1 and 4 in Fig. 4) 
as the composition of a sequence of locally least changing updates where precisely 
the last graph update results in a graph satisfying the GC 4. 


Definition 22 (Locally Least Changing Graph Repair). If Gi is a graph, 
We BRS, w= (h: h > Giri: h > Go)... (ln : In > Gn, tnt In > Gn41) ts 
a sequence of locally least changing graph updates, Gy € [Y] implies n = 0 and 
lı = rı =ide,, Gi ¢ [Y] (for each 2 < i < n), Gn+1 € [YI], (lr) is the iterated 
composition of the updates in 7, and (l,r) € U(G1, Y) is a least changing graph 
repair then (l,r) is a locally least changing graph repair. 


We now state that our delta-based graph repair algorithm Repairap returns all 
desired locally least changing graph repairs upon termination. 


Theorem 5 (Functional Semantics of Repairap ). Repairap is sound (i.e., 
it generates only locally least changing graph repairs) and complete (upon termi- 
nation) with respect to locally least changing graph repairs. 


The state-based algorithms Repairsb,ı and Repairsp,2 are inappropriate in envi- 
ronments where numerous updates that may invalidate consistency are applied 
to a large graph because the procedure of AUTOGRAPH has exponential cost. The 
incremental delta-based algorithm Repairgp is a viable alternative when addi- 
tional memory requirements for storing the ST are acceptable. The AUTOGRAPH 
applications for this algorithm have negligible costs because they may be per- 
formed a priori and must only be performed for subconditions of the consistency 
constraint, which can be assumed to feature reasonably small graphs only. 

Finally, a classification of locally least changing repairs is useful for user- 
based repair selection. Delta preserving repairs defined below represent such a 
basic class, containing only those repairs that preserve the update resulting in a 
graph not satisfying GC yw, i.e., it may be desirable to avoid repairs that revert 
additions or deletions of this update. In our example, the repair related to the 
graph marked 4 in Fig. 4 is not delta preserving w.r.t. u from Fig. 3a. 


Definition 23 (Delta Preserving Graph Repair). If y € DFS, uz = (lə : 
In => Gə,r2 : Ig — G3) € U(G2, Y) is a graph repair, u = (L : h Gi, : 
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I, — Go) is a graph update, and there exists a graph update u such that uy <”? u 
then uz is a delta preserving graph repair with respect to u1. 


7 Related Work 


According to the recent survey on model repair [12], and the corresponding 
exhaustive classification of primary studies selected in the literature review, 
published online [11], we can see that the amount and wide variety of exist- 
ing approaches makes a detailed comparison with all of them infeasible. 

We consider our approach to be innovative, not only because of the proposed 
solutions, but because it addresses the issues of completeness and least changing 
for incremental graph repair in a precise and formal way. From the survey [11, 12] 
we can see that only two other approaches [10,19] address completeness and 
least changing, relying also on constraint-solving technology. The main differ- 
ence with our approach is that they are not incremental. In particular, the work 
of Schoenboeck et al. [19] proposes a logic programming approach allowing the 
exploration of model repair solutions ranked according to some quality crite- 
ria, re-establishing conformance of a model with its metamodel. Soundness and 
completeness of these repair actions is not formally proven. Moreover, the least 
changing bidirectional model transformation approach of Macedo et al. [10] has 
only a bounded search for repairs, relying on a bounded constraint solver. 

Some recent work on rule-based graph repair [9] (not covered by the survey) 
addresses the least-changing principle by developing so-called maximally preserv- 
ing (items are preserved whenever possible) repair programs. This state-based 
approach considers a subset of consistency constraints (up to nesting depth 2) 
handled by our approach, and is not complete, since it produces repairs including 
only a minimal amount of deletions. Some other recent rule-based graph repair 
approach [13,20] (also not covered by the survey) proposes so-called change 
preserving repairs (similar to what we define as delta-preserving). The main dif- 
ference with our work is that we do not require the user to specify consistency- 
preserving operations from which repairs are generated, since we derive repairs 
using constraint solving techniques directly from the consistency constraints. 

Finally, there is a variety of work on incremental evaluation of graph queries 
(see e.g. [2,4]), developed with the aim of efficiently re-evaluating a graph query 
after an update has been performed. Although not employed with the specific aim 
of complete and least changing graph repair, this work is related to our newly 
introduced concept of satisfaction trees, also using specific data structures to 
record with some detail the set of answers to a given query (as described for 
graph conditions, for example, also in [3]). It is part of ongoing work to evaluate 
how STs can be employed similarly in this field of incremental query evaluation. 


8 Conclusion and Future Work 


We presented a logic-based incremental approach to graph repair. It is the first 
approach to graph repair returning a sound and complete overview of least 
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changing repairs with respect to graph conditions equivalent to first-order logic 
on graphs. Technically, it relies on an existing model generation procedure for 
graph conditions together with the newly introduced concept of satisfaction 
trees, encoding if and how a graph satisfies a graph condition. 

As future work, we aim at supporting partial consistency and gradually 
improving it. We are confident that we can extend our work to support attributes, 
since our underlying model generation procedure supports it. Ongoing work is 
the support of more expressive consistency constraints, allowing path-related 
properties. Moreover, we are in the process of implementing the algorithms pre- 
sented here and evaluating them on a variety of case studies. The evaluation also 
pertains to the overall efficiency (for which we employ techniques for localized 
pattern matching) and includes a comparison with other approaches for graph 
repair. Finally, we aim at presenting new and refined properties distinguishing 
between all possible repairs supporting the implementation of interactive repair 
selection procedures. 
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Abstract. Deep Neural Networks (DNNs) are increasingly deployed in 
safety-critical applications including autonomous vehicles and medical 
diagnostics. To reduce the residual risk for unexpected DNN behaviour 
and provide evidence for their trustworthy operation, DNNs should be 
thoroughly tested. The DeepFault whitebox DNN testing approach pre- 
sented in our paper addresses this challenge by employing suspiciousness 
measures inspired by fault localization to establish the hit spectrum of 
neurons and identify suspicious neurons whose weights have not been cal- 
ibrated correctly and thus are considered responsible for inadequate DNN 
performance. DeepFault also uses a suspiciousness-guided algorithm to 
synthesize new inputs, from correctly classified inputs, that increase the 
activation values of suspicious neurons. Our empirical evaluation on sev- 
eral DNN instances trained on MNIST and CIFAR-10 datasets shows 
that DeepFault is effective in identifying suspicious neurons. Also, the 
inputs synthesized by DeepFault closely resemble the original inputs, 
exercise the identified suspicious neurons and are highly adversarial. 


Keywords: Deep Neural Networks - Fault localization - 
Test input generation 


1 Introduction 


Deep Neural Networks (DNNs) [33] have demonstrated human-level capabilities 
in several intractable machine learning tasks including image classification [10], 
natural language processing [56] and speech recognition [19]. These impressive 
achievements raised the expectations for deploying DNNs in real-world appli- 
cations, especially in safety-critical domains. Early-stage applications include 
air traffic control [25], medical diagnostics [34] and autonomous vehicles [5]. The 
responsibilities of DNNs in these applications vary from carrying out well-defined 
tasks (e.g., detecting abnormal network activity [11]) to controlling the entire 
behaviour system (e.g., end-to-end learning in autonomous vehicles [5}). 
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Despite the anticipated benefits from a widespread adoption of DNNs, their 
deployment in safety-critical systems must be characterized by a high degree of 
dependability. Deviations from the expected behaviour or correct operation, as 
expected in safety-critical domains, can endanger human lives or cause significant 
financial loss. Arguably, DNN-based systems should be granted permission for 
use in the public domain only after exhibiting high levels of trustworthiness [6]. 

Software testing is the de facto instrument for analysing and evaluating the 
quality of a software system [24]. Testing enables at one hand to reduce the risk 
by proactively finding and eliminating problems (bugs), and on the other hand to 
evidence, through using the testing results, that the system actually achieves the 
required levels of safety. Research contributions and advice on best practices for 
testing conventional software systems are plentiful; [63], for instance, provides a 
comprehensive review of the state-of-the-art testing approaches. 

Nevertheless, there are significant challenges in applying traditional software 
testing techniques for assessing the quality of DNN-based software [54]. Most 
importantly, the little correlation between the behaviour of a DNN and the soft- 
ware used for its implementation means that the behaviour of the DNN cannot 
be explicitly encoded in the control flow structures of the software [51]. Further- 
more, DNNs have very complex architectures, typically comprising thousand 
or millions of parameters, making it difficult, if not impossible, to determine a 
parameter’s contribution to achieving a task. Likewise, since the behaviour of a 
DNN is heavily influenced by the data used during training, collecting enough 
data that enables exercising all potential DNN behaviour under all possible sce- 
narios becomes a very challenging task. Hence, there is a need for systematic and 
effective testing frameworks for evaluating the quality of DNN-based software [6]. 

Recent research in the DNN testing area introduces novel white-box and 
black-box techniques for testing DNNs [20, 28,36,37,48, 54,55]. Some techniques 
transform valid training data into adversarial through mutation-based heuris- 
tics [65], apply symbolic execution [15], combinatorial [37] or concolic testing [55], 
while others propose new DNN-specific coverage criteria, e.g., neuron cover- 
age [48] and its variants [35] or MC/DC-inspired criteria [52]. We review related 
work in Section 6. These recent advances provide evidence that, while traditional 
software testing techniques are not directly applicable to testing DNNs, the 
sophisticated concepts and principles behind these techniques, if adapted appro- 
priately, could be useful to the machine learning domain. Nevertheless, none of 
the proposed techniques uses fault localization [4,47,63], which can identify parts 
of a system that are most responsible for incorrect behaviour. 

In this paper, we introduce DeepFault, the first fault localization-based white- 
box testing approach for DNNs. The objectives of DeepFault are twofold: (i) 
identification of suspicious neurons, i.e., neurons likely to be more responsible 
for incorrect DNN behaviour; and (ii) synthesis of new inputs, using correctly 
classified inputs, that exercise the identified suspicious neurons. Similar to con- 
ventional fault localization, which receives as input a faulty software and out- 
puts a ranked list of suspicious code locations where the software may be defec- 
tive [63], DeepFault analyzes the behaviour of neurons of a DNN after training to 
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establish their hit spectrum and identifies suspicious neurons by employing sus- 
piciousness measures. DeepFault employs a suspiciousness-guided algorithm to 
synthesize new inputs, that achieve high activation values for suspicious neurons, 
by modifying correctly classified inputs. Our empirical evaluation on the popular 
publicly available datasets MNIST [32] and CIFAR-10 [1] provides evidence that 
DeepFault can identify neurons which can be held responsible for insufficient 
network performance. DeepFault can also synthesize new inputs, which closely 
resemble the original inputs, are highly adversarial and increase the activation 
values of the identified suspicious neurons. To the best of our knowledge, Deep- 
Fault is the first research attempt that introduces fault localization for DNNs to 
identify suspicious neurons and synthesize new, likely adversarial, inputs. 
Overall, the main contributions of this paper are: 


— The DeepFault approach for whitebox testing of DNNs driven by fault local- 
ization; 

— An algorithm for identifying suspicious neurons that adapts suspiciousness 
measures from the domain of spectrum-based fault localization; 

— A suspiciousness-guided algorithm to synthesize inputs that achieve high acti- 
vation values of potentially suspicious neurons; 

— A comprehensive evaluation of DeepFault on two public datasets (MNIST 
and CIFAR-10) demonstrating its feasibility and effectiveness; 


The reminder of the paper is structured as follows. Section 2 presents briefly 
DNNs and fault localization in traditional software testing. Section 3 introduces 
DeepFault and Section4 presents its open-source implementation. Section 5 
describes the experimental setup, research questions and evaluation carried out. 
Sections 6 and 7 discuss related work and conclude the paper, respectively. 


2 Background 


2.1 Deep Neural Networks 


Input 
layer 


Hidden 
layer 1 


Hidden 
layer 2 


We consider Deep Learning software peg 


systems in which one or more system 
modules is controlled by DNNs [13]. 
A typical feed-forward DNN com- 
prises multiple interconnected neu- 


rons organised into several layers: the Camera Steering 

input layer, the output layer and at a 
p À ; ——~ 

least one hidden layer (Fig. 1). Each TR (©) 

DNN layer comprises a sequence of IR Sensor RAEE 


neurons. A neuron denotes a com- 
puting unit that applies a nonlinear 
activation function to its inputs and 
transmits the result to neurons in 
the successive layer. Commonly used 


Fig. 1. A four layer fully-connected DNN 
that receives inputs from vehicle sensors 
(camera, LiDAR, infrared) and outputs a 
decision for speed, steering angle and brake. 
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activation functions are sigmoid, hyperbolic tangent, ReLU (Rectified Linear 
Unit) and leaky ReLU [13]. Except from the input layer, every neuron is con- 
nected to neurons in the successive layer with weights, i.e., edges, whose values 
signify the strength of a connection between neuron pairs. Once the DNN archi- 
tecture is defined, i.e., the number of layers, neurons per layer and activation 
functions, the DNN undergoes a training process using a large amount of labelled 
training data to find weight values that minimise a cost function. 

In general, a DNN could be considered as a parametric multidimensional 
function that consumes input data (e.g, raw image pixels) in its input layer, 
extracts features, i.e., semantic concepts, by performing a series of nonlin- 
ear transformations in its hidden layers, and, finally, produces a decision that 
matches the effect of these computations in its output layer. 


2.2 Software Fault Localization 


Fault localization (FL) is a white box testing technique that focuses on identify- 
ing source code elements (e.g., statements, declarations) that are more likely to 
contain faults. The general FL process [63] for traditional software uses as inputs 
a program P, corresponding to the system under test, and a test suite T, and 
employs an FL technique to test P against T and establish subsets that represent 
the passed and failed tests. Using these sets and information regarding program 
elements p € P, the FL technique extracts fault localization data which is then 
employed by an FL measure to establish the “suspiciousness” of each program 
element p. Spectrum-based FL, the most studied class of FL techniques, uses 
program traces (called program spectra) of successful and failed test executions 
to establish for program element p the tuple (es, €f, ns, nf). Members es and ep 
(ns and ny) represent the number of times the corresponding program element 
has been (has not been) executed by tests, with success and fail, respectively. A 
spectrum-based FL measure consumes this list of tuples and ranks the program 
elements in decreasing order of suspiciousness enabling software engineers to 
inspect program elements and find faults effectively. For a comprehensive survey 
of state-of-the-art FL techniques, see [63]. 


3 DeepFault 


In this section, we introduce our DeepFault whitebox approach that enables to 
systematically test DNNs by identifying and localizing highly erroneous neurons 
across a DNN. Given a pre-trained DNN, DeepFault, whose workflow is shown in 
Fig. 2, performs a series of analysis, identification and synthesis steps to identify 
highly erroneous DNN neurons and synthesize new inputs that exercise erroneous 
neurons. We describe the DeepFault steps in Sections 3.1, 3.2 and 3.3. 

We use the following notations to describe DeepFault. Let M be a DNN with 
l layers. Each layer L;,1 <i < l, consists of s; neurons and the total number of 
neurons in WV is given by s = ae si. Let also n; j be the j-th neuron in the i-th 
layer. When the context is clear, we use n € N to denote any neuron which is part 
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of the DNN N irrespective of its layer. Likewise, we use Ny to denote the neurons 
which belong to the hidden layers of N, i.e., Ny = {nij|l <i< l,l <j <s;}. 
We use 7 to denote the set of test inputs from the input domain of N, t € T 
to denote a concrete input, and u € t for an element of t. Finally, we use the 
function ¢(t, n) to signify the output of the activation function of neuron n € N. 


3.1 Neuron Spectrum Analysis 


The first step of DeepFault involves the analysis of neurons within a DNN to 
establish suitable neuron-based attributes that will drive the detection and local- 
ization of faulty neurons. As highlighted in recent research [18,48], the adop- 
tion of whitebox testing techniques provides additional useful insights regarding 
internal neuron activity and network behaviour. These insights cannot be easily 
extracted through black-box DNN testing, i.e., assessing the performance of a 
DNN considering only the decisions made given a set of test inputs T. 


testing 


[ = suspiciousness 
setT = measure 


Trained DNN 
Input Hidden Hidden Output 
layer layeri layer2 layer 


Analysed DNN 
Input Hidden Hidden Output 
layer layer layer2 layer 


Neuron neurons Suspicious 
spectrum H spectrum neuron 
analysis identification 


I iN 
correct 
classifications 


Fig. 2. DeepFault workflow. 


La |Suspiciousness: N 
guided input 
synthesis 


synthesized 
inputs 
(adversarial) 


suspicious neuron 


DeepFault initiates the identification of suspicious neurons by establish- 
ing attributes that capture a neuron’s execution pattern. These attributes are 
defined as follows. Attributes attr? and attr? signify the number of times neu- 
ron n was active (i.e., the result of the activation function ¢(t,n) was above 
the predefined threshold) and the network made a successful or failed decision, 
respectively. Similarly, attributes attr™’ and attr?f cover the case in which neu- 
ron n is not active. DeepFault analyses the behaviour of neurons in the DNN 
hidden layers, under a specific test set T, to assemble a Hit Spectrum (HS) for 
each neuron, i.e., a tuple describing its dynamic behaviour. We define formally 
the HS as follows. 


Definition 1. Given a DNN M and a test set 7, we say that for any neuron n € 
Ny its hit spectrum is given by the tuple HSn = (attr, attr2", attr?’, attr™). 


n 


Note that the sum of each neuron’s HS should be equal to the size of 7. 
Clearly, the interpretation of a hit spectrum (cf. Definition 1) is meaning- 

ful only for neurons in the hidden layers of a DNN. Since neurons within the 

input layer Lı correspond to elements from the input domain (e.g., pixels from 
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an image captured by a camera in Fig. 1), we consider them to be “correct-by- 
construction”. Hence, these neurons cannot be credited or held responsible for a 
successful or failed decision made by the network. Furthermore, input neurons 
are always active and thus propagate one way or another their values to neu- 
rons in the following layer. Likewise, neurons within the output layer Lı simply 
aggregate values from neurons in the penultimate layer Lı—1, multiplied by the 
corresponding weights, and thus have limited influence in the overall network 
behaviour and, accordingly, to decision making. 


3.2 Suspicious Neurons Identification 


During this step, DeepFault consumes the set of hit spectrums, derived from 
DNN analysis, and identifies suspicious neurons which are likely to have made 
significant contributions in achieving inadequate DNN performance (low accu- 
racy/high loss). To achieve this identification, DeepFault employs a spectrum- 
based suspiciousness measure which computes a suspiciousness score per neu- 
ron using spectrum-related information. Neurons with the highest suspiciousness 
score are more likely to have been trained unsatisfactorily and, hence, contribut- 
ing more to incorrect DNN decisions. This indicates that the weights of these 
neurons need further calibration [13]. We define neuron suspiciousness as follows. 


Table 1. Suspiciousness measures used in DeepFault 


Suspiciousness Measure Algebraic Formula 


Tarantula [23]: 


attri /(attr®*+attrp*) 
af f f 5 5 
attra! /(attrat+attrn!)+attras /(attr35+attrns) 


af 
Ochiai [42]: aa 


V (attra tattrnt).(attraf+attras) 


ttrat 
D 62 2 
s EEE T EEEE i 
[ ] attr +attr? 


* > 0 is a variable. We used * = 3, among the most widely explore values [47,63]. 


Algorithm 1. Identification of suspicious neurons 
1: function SusPICIOUSNEURONSIDENTIFICATION(N, T, k) 


2: So > suspiciousness vector 
3: for all n € N do 

4: HS, — > n-th neuron hit spectrum vector 
5: for all p € {as,af,ns,nf} do 

6: al, =ATTR(T, p) > establish attribute for property p 
7: HSn = HSn U {ah} > construct hit spectrum (cf. Def. 1) 
8: S = S U {Susp(HS,,)} > determine neuron suspiciousness (cf. Def. 2) 
9: SN = {n|Susp(HS,) E SELECT(S,k)} > select the k most suspicious neurons 
10: return SN 
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Definition 2. Given a neuron n € Ny with HS, being its hit spectrum, 


a neuron’s spectrum-based suspiciousness is given by the function SUSP,, 
HS, > R. 


Intuitively, a suspiciousness measure facilitates the derivation of correlations 
between a neuron’s behaviour given a test set 7 and the failure pattern of T as 
determined by the overall network behaviour. Neurons whose behaviour pattern 
is close to the failure pattern of J are more likely to operate unreliably, and 
consequently, they should be assigned higher suspiciousness. Likewise, neurons 
whose behaviour pattern is dissimilar to the failure pattern of T are considered 
more trustworthy and their suspiciousness values should be low. 

In this paper, we instantiate DeepFault with three different suspiciousness 
measures, i.e., Tarantula [23], Ochiai [42] and D* [62] whose algebraic formulae 
are shown in Table 1. The general principle underlying these suspiciousness mea- 
sures is that the more often a neuron is activated by test inputs for which the 
DNN made an incorrect decision, and the less often the neuron is activated by 
test inputs for which the DNN made a correct decision, the more suspicious the 
neuron is. These suspiciousness measures have been adapted from the domain of 
fault localization in software engineering [63] in which they have achieved com- 
petitive results in automated software debugging by isolating the root causes 
of software failures while reducing human input. To the best of our knowledge, 
DeepFault is the first approach that proposes to incorporate these suspiciousness 
measures into the DNN domain for the identification of defective neurons. 

The use of suspiciousness measures in DNNs targets the identification of a set 
of defective neurons rather than diagnosing an isolated defective neuron. Since 
the output of a DNN decision task is typically based on the aggregated effects of 
its neurons (computation units), with each neuron making its own contribution 


Algorithm 2. New input synthesis guided by the identified suspicious neurons 


Input: SN < suspicious neurons (Algorithm 1), step — step size in gradient ascent 
T, < test inputs correctly classified by M, d — new inputs maximum allowed distance 
1: function SUSPICIOUSNESSGUIDEDINPUTSYNTHESIS(S'N, Js, d, step) 


2: NTO > set of synthesized inputs 
3: for allt € T, do 

4: Gi — 90 > gradient collection of suspicious neurons 
5: for all n € SN do 

6: n” = p(t, n) > determine output of neuron 
7 G = ðn” /dt > establish gradient of neuron for t 
8 Gi = GU {G} > collect gradients of suspicious neurons for t 
9: to > initialisation of input to be synthesised 
10: for all u € t do 
11: Ugradient = Vaca G/|G:| > determine average gradient of u 
12: Ugradient = GRADIENTCONSTRAINT(Ugradient, d, step) 
13: t = t! ~ {DOMAINCONSTRAINTS(u + Ugradient) } 
14: NT = NTU {t'} 


15: return NT 
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to the whole computation procedure [13], identifying a single point of failure (i.e., 
a single defective neuron) has limited value. Thus, after establishing the suspi- 
ciousness of neurons in the hidden layers of a DNN, the neurons are ordered in 
decreasing order of suspiciousness and the k,1 < l < s, most probably defective 
(i.e., “undertrained”) neurons are selected. Algorithm 1 presents the high-level 
steps for identifying and selecting the k most suspicious neurons. When multiple 
neurons achieve the same suspiciousness score, DeepFault resolves ties by pri- 
oritising neurons that belong to deeper hidden layers (i.e., they are closer to the 
output layer). The rationale for this decision lies in fact that neurons in deeper 
layers are able to learn more meaningful representations of the input space [69]. 


3.3 Suspiciousness-Guided Input Synthesis 


DeepFault uses the selected k most suspicious neurons (cf. Section 3.2) to synthe- 
size inputs that exercise these neurons and could be adversarial (see Section 5). 
The premise underlying the synthesis is that increasing the activation values of 
suspicious neurons will cause the propagation of degenerate information, com- 
puted by these neurons, across the network, thus, shifting the decision boundaries 
in the output layer. To achieve this, DeepFault applies targeted modification of 
test inputs from the test set 7 for which the DNN made correct decisions (e.g., for 
a classification task, the DNN determined correctly their ground truth classes) 
aiming to steer the DNN decision to a different region (see Fig. 2). 

Algorithm 2 shows the high-level process for synthesising new inputs based 
on the identified suspicious neurons. The synthesis task is underpinned by a gra- 
dient ascent algorithm that aims at determining the extent to which a correctly 
classified input should be modified to increase the activation values of suspicious 
neurons. For any test input t € T, correctly classified by the DNN, we extract 
the value of each suspicious neuron and its gradient in lines 6 and 7, respectively. 
Then, by iterating over each input dimension u € t, we determine the gradient 
value Ugradient by which u will be perturbed (lines 11-12). The value of gradient 
is based on the mean gradient of u across the suspicious neurons controlled by 
the function GRADIENTCONSTRAINTS. This function uses a test set specific step 
parameter and a distance d parameter to facilitate the synthesis of realistic test 
inputs that are sufficiently close, according to L..-norm, to the original inputs. 
We demonstrate later in the evaluation of DeepFault (cf. Table4) that these 
parameters enable the synthesis of inputs similar to the original. The function 
DOMAINCONSTRAINTS applies domain-specific constraints thus ensuring that u 
changes due to gradient ascent result in realistic and physically reproducible 
test inputs as in [48]. For instance, a domain-specific constraint for an image 
classification dataset involves bounding the pixel values of synthesized images 
to be within a certain range (e.g., 0-1 for the MNIST dataset [32]). Finally, we 
append the updated u to construct a new test input t (line 13). 

As we experimentally show in Section 5, the suspiciousness measures used by 
DeepFault can synthesize adversarial inputs that cause the DNN to misclassify 
previously correctly classified inputs. Thus, the identified suspicious neurons can 
be attributed a degree of responsibility for the inadequate network performance 
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meaning that their weights have not been optimised. This reduces the DNN’s 
ability for high generalisability and correct operation in untrained data. 


4 Implementation 


To ease the evaluation and adoption of the DeepFault approach (cf. Fig. 2), we 
have implemented a prototype tool on top of the open-source machine learn- 
ing framework Keras (v2.2.2) [9] with Tensorflow (v1.10.1) backend [2]. The 
full experimental results summarised in the following section are available on 
DeepFault project page at https://DeepFault.github.io. 


5 Evaluation 


5.1 Experimental Setup 


We evaluate DeepFault on two popular publicly available datasets. MNIST [32] 
is a handwritten digit dataset with 60,000 training samples and 10,000 testing 
samples; each input is a 28 x 28 pixel image with a class label from 0 to 9. 
CIFAR-10 [1] is an image dataset with 50,000 training samples and 10,000 testing 
samples; each input is a 32 x 32 image in ten different classes (e.g., dog, bird, 
car). 

For each dataset, we study three DNNs that have been used in previous 
research [1,60] (Table 2). All DNNs have different architecture and number 
of trainable parameters. For MNIST, we use fully connected neural networks 
(dense) and for CIFAR-10 we use convolutional neural networks with max- 
pooling and dropout layers that have been trained to achieve at least 95% and 
70% accuracy on the provided test sets, respectively. The column ‘Architecture’ 
shows the number of fully connected hidden layers and the number of neurons per 
layer. Each DNN uses a leaky ReLU [38] as its activation function (œ = 0.01), 
which has been shown to achieve competitive accuracy results [67]. 

We instantiate DeepFault using the suspiciousness measures Tarantula [23], 
Ochiai [42] and D* [62] (Table 1). We analyse the effectiveness of DeepFault 
instances using different number of suspicious neurons, i.e., k € {1,2,3,5,10} 
and k € {10, 20, 30, 40,50} for MNIST and CIFAR models, respectively. We also 
ran preliminary experiments for each model from Table2 to tune the hyper- 
parameters of Algorithm 2 and facilitate replication of our findings. Since gra- 
dient values are model and input specific, the perturbation magnitude should 
reflect these values and reinforce their impact. We determined empirically that 
step = 1 and step = 10 are good values, for MNIST and CIFAR models, respec- 
tively, that enable our algorithm to perturb inputs. We also set the maximum 
allowed distance d to be at most 10% (Lao) with regards to the range of each 
input dimension (maximum pixel value). As shown in Table4, the synthesized 
inputs are very similar to the original inputs and are rarely constrained by d. 
Studying other step and d values is part of our future work. All experiments were 
run on an Ubuntu server with 16GB memory and Intel Xeon E5-2698 2.20 GHz. 
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Table 2. Details of MNIST and CIFAR-10 DNNs used in the evaluation. 


Dataset Model Name # Trainable Params Architecture Accuracy 


MNIST MNIST_1 27,420 <5 x 30> 96.6% 
MNIST_2 22,975 <6 x 25> 95.8% 

MNIST_3 18,680 <8 x 20> 95% 

CIFAR-10 CIFAR_1 411,434 <4 x 128> 70.1% 
CIFAR_2 724,010 <2 x 256> 72.6% 

CIFAR_3 1,250,858 <1 x 512> 76.1% 


5.2 Research Questions 
Our experimental evaluation aims to answer the following research questions. 


RQI1 (Validation): Can DeepFault find suspicious neurons effectively? 
If suspicious neurons do exist, suspiciousness measures used by DeepFault 
should comfortably outperform a random suspiciousness selection strategy. 

RQ2 (Comparison): How do DeepFault instances using different suspi- 
ciousness measures compare against each other? Since DeepFault can 
work with multiple suspiciousness measures, we examined the results pro- 
duced by DeepFault instances using Tarantula [23], Ochiai [42] and D* [62]. 

RQ3 (Suspiciousness Distribution): How are suspicious neurons found 
by DeepFault distributed across a DNN? With this research question, 
we analyse the distribution of suspicious neurons in hidden DNN layers using 
different suspiciousness measures. 

RQ4 (Similarity): How realistic are inputs synthesized by DeepFault? 
We analysed the distance between synthesized and original inputs to examine 
the extent to which DeepFault synthesizes realistic inputs. 

RQ5 (Increased Activations): Do synthesized inputs increase activa- 
tion values of suspicious neurons? We assess whether the suspiciousness- 
guided input synthesis algorithm produces inputs that reinforce the influence 
of suspicious neurons across a DNN. 

RQ6 (Performance): How efficiently can DeepFault synthesize new 
inputs? We analysed the time consumed by DeepFault to synthesize new 
inputs and the effect of suspiciousness measures used in DeepFault instances. 


5.3 Results and Discussion 


RQI1 (Validation). We apply the DeepFault workflow to the DNNs from 
Table 2. To this end, we instantiate DeepFault with a suspiciousness measure, 
analyse a pre-trained DNN given the dataset’s test set 7T, identify k neurons 
with the highest suspiciousness scores and synthesize new inputs, from correctly 
classified inputs, that exercise these suspicious neurons. Then, we measure the 
prediction performance of the DNN on the synthesized inputs using the stan- 
dard performance metrics: cross-entropy loss, i.e., the divergence between output 
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and target distribution, and accuracy, i.e., the percentage of correctly classified 
inputs over all given inputs. Note that DNN analysis is done per class, since the 
activation pattern of inputs from the same class is similar to each other [69]. 

Table 3 shows the average loss and accuracy for inputs synthesized by Deep- 
Fault instances using Tarantula (T), Ochiai (O), D* (D) and a random selection 
strategy (R) for different number of suspicious neurons k on the MNIST (top) 
and CIFAR-10 (bottom) models from Table 2. Each cell value in Table 3, except 
from random R, is averaged over 100 synthesized inputs (10 per class). For R, we 
collected 500 synthesized inputs (50 per class) over five independent runs, thus, 
reducing the risk that our findings may have been obtained by chance. 

As expected (see Table3), DeepFault using any suspiciousness measure (T, 
O, D) obtained considerably lower prediction performance than R on MNIST 
models. The suspiciousness measures T and O are also effective on CIFAR-10 
model, whereas the performance between D and R is similar. These results show 
that the identified k neurons are actually suspicious and, hence, their weights 
are insufficiently trained. Also, we have sufficient evidence that increasing the 
activation value of suspicious neurons by slightly perturbing inputs that have 
been classified correctly by the DNN could transform them into adversarial. 

We applied the non-parametric statistical test Mann-Whitney with 95% con- 
fidence level [61] to check for statistically significant performance difference 
between the various DeepFault instances and random. We confirmed the signifi- 
cant difference among T-R and O-R (p-value < 0.05) for all MNIST and CIFAR- 
10 models and for all k values. We also confirmed the interesting observation that 
significant difference between D-R exists only for MNIST models (all k values). 
We plan to investigate this observation further in our future work. 

Another interesting observation from Table 3 is the small performance differ- 
ence of DeepFault instances for different k values. We investigated this further 
by analyzing the activation values of the next k’ most suspicious neurons accord- 
ing to the suspiciousness order given by Algorithm 1. For instance, if k = 2 we 
analysed the activation values of the next k’ € {3,,5,10} most suspicious neu- 
rons. We observed that the synthesized inputs frequently increase the activation 
values of the k’ neurons whose suspiciousness scores are also high, in addition 
to increasing the values of the top k suspicious neurons. 

Considering these results, we have empirical evidence about the existence of 
suspicious neurons which can be responsible for inadequate DNN performance. 
Also, we confirmed that DeepFault instances using sophisticated suspiciousness 
measures significantly outperform a random strategy for most of the studied 
DNN models (except from the D-R case on CIFAR models; see RQ3). 


RQ2 (Comparison). We compare DeepFault instances using different sus- 
piciousness measures and carried out pairwise comparisons using the Mann- 
Whitney test to check for significant difference between T, O, and D*. We show 
the results of these comparisons on the project’s webpage. Ochiai achieves bet- 
ter results on MNIST_1 and MNIST_3 models for various k values. This result 
suggests that the suspicious neurons reported by Ochiai are more responsible 
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Table 3. Accuracy and loss of inputs synthesized by DeepFault on MNIST (top) and 
CIFAR-10 (bottom) datasets. The best results per suspiciousness measure are shown 
in bold. (k:#suspicious neurons, T:Tarantula, O:Ochiai, D:D*, R:Random) 


k | Measure MNIST_1 MNIST_2 MNIST_3 


1 Loss 3.55 | 6.19 | 4.03 | 2.42] 3.48 | 3.53 | 3.97 | 2.78] 7.35 | 8.23 6.36 | 3.66 
Accuracy} 0.26 | 0.16 | 0.2 |0.59f 0.3 | 0.2 | 0.5 |0.49f 0.16 | 0.1 | 0.13 | 0.39 
2 Loss 3.73 | 6.08 | 3.18 | 2.67] 3.12 | 3.76 | 4.08 | 0.9 | 4.27 |6.81 6.5 | 3.06 
Accuracy] 0.16 | 0.23 | 0.4 |0.58 f 0.23 | 0.23 | 0.13 | 0.77] 0.29 | 0.13 | 0.26 | 0.56 
3 Loss 4.1 | 6.19 |6.25)1.14] 2.39 | 3.94) 3.04 | 1.61] 3.33 | 7.59) 6.98 | 2.91 
Accuracy | 0.23 | 0.23 | 0.33 | 0.77] 0.46 | 0.26 | 0.23 | 0.67 [ 0.26 | 0.06 | 0.16 | 0.61 
5 Loss 4.63 | 6.68 |6.97| 1.1 | 2.49 |3.64| 3.48 |0.94] 4.15 |7.22 6.47 | 1.22 
Accuracy} 0.23 | 0.23 |0.13] 0.79 0.26 | 0.26 | 0.2 |0.73] 0.16 | 0.1 | 0.26 | 0.77 
10, Loss 4.97 | 6.95 | 7.4 | 1.3 | 2.08 | 3.06 | 3.82/0.49] 4.45 | 7.16 5.9 |0.57 
Accuracy} 0.23 | 0.2 | 0.23 0.75f 0.4 |0.23| 0.26 |0.86 [0.13 / 0.13 | 0.13 | 0.87 


k | Measure CIFAR_1 CIFAR_2 CIFAR.3 


10, Loss 12.75 | 13.49) 1.33 |3.25]8.42| 8.41} 0 |2.4976.12/ 1.77 1.12 |1.21 
Accuracy} 0.2 | 0.16 | 0.9 |0.79]0.47/0.47| 1.0 |0.84]0.62) 0.88 | 0.92 |0.91 
20, Loss |12.79| 12.43 | 0.45 | 1.8 [8.81) 6.92 | 0.32 | 1.67 | 6.12| 1.12 0.96 | 0.64 
Accuracy} 0.2 | 0.22 | 0.96 |0.88]0.44) 0.55 | 0.97 | 0.89] 0.62) 0.92 | 0.93 | 0.95 
30) Loss |13.19) 13.13 | 0.38 | 1.43 | 8.35 | 6.32 | 0.55 |0.86 5.64) 0.76 0.42 | 0.41 
Accuracy | 0.18 | 0.18 | 0.95 | 0.9 [0.48) 0.6 | 0.95 |0.94] 0.64) 0.93 | 0.96 | 0.97 
40| Loss |13.69| 11.92 | 0.8 |1.29] 9.4 | 5.01 | 0.32 |0.61 | 4.51) 1.12 | 0.22 |0.54 
Accuracy | 0.14 | 0.26 | 0.92 |0.91] 0.41 | 0.68 | 0.97 |0.95 | 0.72) 0.92 | 0.97 | 0.96 
50, Loss 12.1 |13.37 | 0.36 0.9 [9.59| 3.38} 0 |0.56]4.67/| 0.04 | 0.64 |0.48 
Accuracy] 0.24 | 0.17 | 0.96 |0.94] 0.4 | 0.78 | 1.0 |0.96] 0.71) 0.98 | 0.96 | 0.96 


for insufficient DNN performance. D* performs competitively on MNIST_1 and 
MNIST_3 for k € {3,5,10}, but its performance on CIFAR-10 models is sig- 
nificantly inferior to Tarantula and Ochiai. The best performing suspiciousness 
measure in CIFAR models for most k values is, by a great amount, Tarantula. 
These findings show that multiple suspiciousness measures could be used for 
instantiating DeepFault with competitive performance. We also have evidence 
that DeepFault using D* is ineffective for some complex networks (e.g., CIFAR- 
10), but there is insufficient evidence for the best performing DeepFault instance. 
Our findings conform to the latest research on software fault localization which 
claims that there is no single best spectrum-based suspiciousness measure [47]. 


RQ3 (Suspiciousness Distribution). We analysed the distribution of suspi- 
cious neurons identified by DeepFault instances across the hidden DNN layers. 
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Fig. 3. Suspicious neurons distribution on MNIST-3 (left) and CIFAR.3 (right) models. 


Figure 3 shows the distribution of suspicious neurons on MNIST_3 and CIFAR.3 
models with k = 10 and k = 50, respectively. Considering MNIST_8, the major- 
ity of suspicious neurons are located at the deeper hidden layers (Dense 4-Dense 
8) irrespective of the suspiciousness measure used by DeepFault. This observa- 
tion holds for the other MNIST models and k values. On CIFAR_3, however, we 
can clearly see variation in the distributions across the suspiciousness measures. 
In fact, D* suggests that most of the suspicious neurons belong to initial hidden 
layers which is in contrast with Tarantula’s recommendations. As reported in 
RQ2, the inputs synthesized by DeepFault using Tarantula achieved the best 
results on CIFAR models, thus showing that the identified neurons are actually 
suspicious. This difference in the distribution of suspicious neurons explains the 
inferior inputs synthesized by D* on CIFAR, models (Table 3). 

Another interesting finding concerns the relation between the suspicious neu- 
rons distribution and the “adversarialness” of synthesized inputs. When suspi- 
cious neurons belong to deeper hidden layers, the likelihood of the synthesized 
input being adversarial increases (cf. Table 3 and Fig. 3). This finding is explained 
by the fact that initial hidden layers transform input features (e.g., pixel val- 
ues) into abstract features, while deeper hidden layers extract more semantically 
meaningful features and, thus, have higher influence in the final decision [13]. 


RQ4 (Similarity). We examined the distance between original, correctly classi- 
fied, inputs and those synthesized by DeepFault, to establish DeepFault’s ability 
to synthesize realistic inputs. Table4 (left) shows the distance between orig- 
inal and synthesized inputs for various distance metrics (Lı Manhattan, Lə 
Euclidean, Loo Chebyshev) for different k values (# suspicious neurons). The 
distance values, averaged over inputs synthesized using the DeepFault suspi- 
ciousness measures (T, O and D*), demonstrate that the degree of perturbation 
is similar irrespective of k for MNIST models, whereas for CIFAR models the 
distance decreases as k increases. Given that a MNIST input consists of 784 
pixels, with each pixel taking values in [0, 1], the average perturbation per input 
is less than 5.28% of the total possible perturbation (Lı distance). Similarly, 
for a CIFAR input that comprises 3072 pixels, with each pixel taking values 
in {0,1,...,255}, the average perturbation per input is less that 0.03% of the 
total possible perturbation (Lı distance). Thus, for both datasets, the difference 
of synthesized inputs to their original versions is very small. We qualitatively 
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Table 4. Distance between synthesized and original inputs. The values shown represent 
minimal perturbation to the original inputs (< 5% for MNIST and < 1% for CIFAR- 
10). 


k MNIST CIFAR Susp. MNIST CIFAR 
MNIST(CIFAR)| Lı [L2 |Loo|Li L2 Loo ||measure | Lı | Le [Lol Li Da- | Leo 
1(10) 41.4|2.0 |0.1 |179.07)7216.6]15.46|/Tarantula}40.3/1.97| 0.1 |180.23/6575.6/19.41 
2(20) 41.2)1.99]0.1 |144.95|5897.4)12.45]/Ochiai 41.01.98] 0.1 |110.45|4825.3| 7.84 
3(30) 40.9)1.98]0.1 |124.61]5073.9|10.67|/D* 41.51.99] 0.1 | 109.4 |4823.2] 7.39 
5(40) 40.7/1.97|0.1 |113.45/4579.2)9.89 |Random /39.2}1.92] 0.1]121.73/4988.1]11.63 
10(50) 40.3)1.96/0.1 )104.72/4273 19.24 


Fig. 4. Synthesized images (top) and their originals (bottom). For each dataset, suspi- 
cious neurons are found using (from left to right) Tarantula, Ochiai, D* and Random. 


support our findings by showing in Fig. 4 the synthesized images and their orig- 
inals for an example set of inputs from the MNIST and CIFAR-10 datasets. 

We also compare the distances between original and synthesized inputs based 
on the suspiciousness measures (Table 4 right). The inputs synthesized by Deep- 
Fault instances using T, O or D* are very close to the inputs of the random 
selection strategy (Lı distance). Considering these results, we can conclude that 
DeepFault is effective in synthesizing highly adversarial inputs (cf. Table3) that 
closely resemble their original counterparts. 


RQ5 (Increasing Activations). Table 5. Effectiveness of suspiciousness-guided 
We studied the activation values input synthesis algorithm to increase activations 
of suspicious neurons identified values of suspicious neurons. 


by DeepFault to examine whether k: MNIST(CIFAR) 

the synthesized inputs increase 
the values of these neurons. The MNIST | 98% | 99% |97% (97% | 91% 
gradients of suspicious neurons CIFAR | 91% | 92% | 90% | 89% | 88% 

used in our suspiciousness-guided 

input synthesis algorithm might be conflicting and a global increase in all sus- 
picious neurons’ values might not be feasible. This can occur if some neurons’ 
gradients are negative, indicating a decrease in an input feature’s value, whereas 


other gradients are positive and require to increase the value of the same fea- 
ture. Table5 shows the percentage of suspicious neurons k, averaged over all 
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suspiciousness measures for all considered MNIST and CIFAR-10 models from 
Table 2, whose values were increased by the inputs synthesized by DeepFault. 
For MNIST models, DeepFault synthesized inputs that increase the suspicious 
neurons’ values with success at least 97% for k € {1,2,3,5}, while the aver- 
age effectiveness for CIFAR models is 90%. These results show the effective- 
ness of our suspiciousness-guided input synthesis algorithm in generating inputs 
that increase the activation values of suspicious neurons (see https://DeepFault. 
github. io). 


RQ6 (Performance). We measured the performance of Algorithm2 to syn- 
thesize new inputs (https://DeepFault.github.io). The average time required to 
synthesize a single input for MNIST and CIFAR models is 1 s and 24.3 s, respec- 
tively. The performance of the algorithm depends on the number of suspicious 
neurons (k), the distribution of those neurons over the DNN and its architecture. 
For CIFAR models, for instance, the execution time per input ranges between 
3s (k = 10) and 48s (k = 50). We also confirmed empirically that more time 
is taken to synthesize an input if the suspicious neurons are in deeper hidden 
layers. 


5.4 Threats to Validity 


Construct validity threats might be due to the adopted experimental method- 
ology including the selected datasets and DNN models. To mitigate this threat, 
we used widely studied public datasets (MNIST [32] and CIFAR-10 [1]), and 
applied DeepFault to multiple DNN models of different architectures with com- 
petitive prediction accuracies (cf. Table 2). Also, we mitigate threats related to 
the identification of suspicious neurons (Algorithm 1) by adapting suspiciousness 
measures from the fault localization domain in software engineering [63]. 
Internal validity threats might occur when establishing the ability of Deep- 
Fault to synthesize new inputs that exercise the identified suspicious neurons. 
To mitigate this threat, we used various distance metrics to confirm that the 
synthesized inputs are close to the original inputs and similar to the inputs syn- 
thesized by a random strategy. Another threat could be that the suspiciousness 
measures employed by DeepFault accidentally outperform the random strategy. 
To mitigate this threat, we reported the results of the random strategy over five 
independent runs per experiment. Also, we ensured that the distribution of the 
randomly selected suspicious neurons resembles the distribution of neurons iden- 
tified by DeepFault suspiciousness measures. We also used the non-parametric 
statistical test Mann-Whitney to check for significant difference in the perfor- 
mance of DeepFault instances and random with a 95% confidence level. 
External validity threats might exist if DeepFault cannot access the internal 
DNN structure to assemble the hit spectrums of neurons and establish their sus- 
piciousness. We limit this threat by developing DeepFault using the open-source 
frameworks Keras and Tensorflow which enable whitebox DNN analysis. We also 
examined various spectrum-based suspiciousness measures, but other measures 
can be investigated [63]. We further reduce the risk that DeepFault might be dif- 
ficult to use in practice by validating it against several DNN instances trained on 
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two widely-used datasets. However, more experiments are needed to assess the 
applicability of DeepFault in domains and networks with characteristics different 
from those used in our evaluation (e.g., LSTM and Capsule networks [50]). 


6 Related Work 


DNN Testing and Verification. The inability of blackbox DNN testing to 
provide insights about the internal neuron activity and enable identification 
of corner-case inputs that expose unexpected network behaviour [14], urged 
researchers to leverage whitebox testing techniques from software engineer- 
ing [28,35,43,48,54]. DeepXplore [48] uses a differential algorithm to gener- 
ate inputs that increase neuron coverage. DeepGauge [35] introduces multi- 
granularity coverage criteria for effective test synthesis. Other research proposes 
testing criteria and techniques inspired by metamorphic testing [58], combina- 
torial testing [37], mutation testing [36], MC/DC [54], symbolic execution [15] 
and concolic testing [55]. 

Formal DNN verification aims at providing guarantees for trustworthy DNN 
operation [20]. Abstraction refinement is used in [49] to verify safety properties of 
small neural networks with sigmoid activation functions, while AI? [12] employs 
abstract interpretation to verify similar properties. Reluplex [26] is an SMT- 
based approach that verifies safety and robustness of DNNs with ReLUs, and 
DeepSafe [16] uses Reluplex to identify safe regions in the input space. DLV [60] 
can verify local DNN robustness given a set of user-defined manipulations. 

DeepFault adopts spectrum-based fault localization techniques to systemati- 
cally identify suspicious neurons and uses these neurons to synthesize new inputs, 
which is mostly orthogonal to existing research on DNN testing and verification. 


Adversarial Deep Learning. Recent studies have shown that DNNs are vul- 
nerable to adversarial examples [57] and proposed search algorithms [8, 40,41, 44], 
based on gradient descent or optimisation techniques, for generating adversarial 
inputs that have a minimal difference to their original versions and force the DNN 
to exhibit erroneous behaviour. These types of adversarial examples have been 
shown to exist in the physical world too [29]. The identification of and protection 
against these adversarial attacks, is another active area of research [45,59]. Deep- 
Fault is similar to these approaches since it uses the identified suspicious neurons 
to synthesize perturbed inputs which as we have demonstrated in Section 5 are 
adversarial. Extending DeepFault to support the synthesis of adversarial inputs 
using these adversarial search algorithms is part of our future work. 


Fault Localization in Traditional Software. Fault localization is widely 
studied in many software engineering areas including including software debug- 
ging [46], program repair [17] and failure reproduction [21,22]. The research focus 
in fault localization is the development of identification methods and suspicious- 
ness measures that isolate the root causes of software failures with reduced engi- 
neering effort [47]. The most notable fault localization methods are spectrum- 
based [3, 23, 30, 31,62], slice-based [64] and model-based [39]. Threats to the value 
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of empirical evaluations of spectrum-based fault localization are studied in [53], 
while the theoretical analyses in [66,68] set a formal foundation about desirable 
formal properties that suspiciousness measures should have. We refer interested 
readers to a recent comprehensive survey on fault localization [63]. 


7 Conclusion 


The potential deployment of DNNs in safety-critical applications introduces 
unacceptable risks. To reduce these risks to acceptable levels, DNNs should be 
tested thoroughly. We contribute in this effort, by introducing DeepFault, the 
first fault localization-based whitebox testing approach for DNNs. DeepFault 
analyzes pre-trained DNNs, given a specific test set, to establish the hit spec- 
trum of each neuron, identifies suspicious neurons by employing suspiciousness 
measures and synthesizes new inputs that increase the activation values of the 
suspicious neurons. Our empirical evaluation on the widely-used MNIST and 
CIFAR-10 datasets shows that DeepFault can identify neurons which can be 
held responsible for inadequate performance. DeepFault can also synthesize new 
inputs, which closely resemble the original inputs, are highly adversarial and 
exercise the identified suspicious neurons. In future work, we plan to evaluate 
DeepFault on other DNNs and datasets, to improve the suspiciousness-guided 
synthesis algorithm and to extend the synthesis of adversarial inputs [44]. We will 
also explore techniques to repair the identified suspicious neurons, thus enabling 
to reason about the safety of DNNs and support safety case generation [7, 27]. 
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Abstract. Variability models allow effective building of many custom 
model variants for various configurations. Lifted model checking for a 
variability model is capable of verifying all its variants simultaneously 
in a single run by exploiting the similarities between the variants. The 
computational cost of lifted model checking still greatly depends on the 
number of variants (the size of configuration space), which is often huge. 
One of the most promising approaches to fighting the configuration space 
explosion problem in lifted model checking are variability abstractions. In 
this work, we define a novel game-based approach for variability-specific 
abstraction and refinement for lifted model checking of the full CTL, 
interpreted over 3-valued semantics. We propose a direct algorithm for 
solving a 3-valued (abstract) lifted model checking game. In case the 
result of model checking an abstract variability model is indefinite, we 
suggest a new notion of refinement, which eliminates indefinite results. 
This provides an iterative incremental variability-specific abstraction and 
refinement framework, where refinement is applied only where indefinite 
results exist and definite results from previous iterations are reused. 


1 Introduction 


Software Product Line (SPL) [6] is an efficient method for systematic develop- 
ment of a family of related models, known as variants (valid products), from a 
common code base. Each variant is specified in terms of features (static con- 
figuration options) selected for that particular variant. SPLs are particularly 
popular in the embedded and critical system domains (e.g. cars, phones, avion- 
ics, healthcare). 

Lifted model checking [4,5] is a useful approach for verifying properties of 
variability models (SPLs). Given a variability model and a specification, the 
lifted model checking algorithm, unlike the standard non-lifted one, returns pre- 
cise conclusive results for all individual variants, that is, for each variant it 
reports whether it satisfies or violates the specification. The main disadvantage 
of lifted model checking is the configuration space explosion problem, which refers 
© The Author(s) 2019 
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to the high number of variants in the variability model. In fact, exponentially 
many variants can be derived from only few configuration options (features). 
One of the most successful approaches to fighting the configuration space explo- 
sion are so-called variability abstractions [12,14,15,17]. They hide some of the 
configuration details, so that many of the concrete configurations become indis- 
tinguishable and can be collapsed into a single abstract configuration (variant). 
This results in smaller abstract variability models with a smaller number of 
abstract configurations. In order to be conservative w.r.t. the full CTL temporal 
logic, abstract variability models have two types of transitions: may-transitions 
which represent possible transitions in the concrete model, and must-transitions 
which represent the definite transitions in the concrete model. May and must 
transitions correspond to over and under approximations, and are needed in 
order to preserve universal and existential CTL properties, respectively. 

Here we consider the 3-valued semantics for interpreting CTL formulae over 
abstract variability models. This semantics evaluates a formula on an abstract 
model to either true, false, or indefinite. Abstract variability models are designed 
to be conservative for both true and false. However, the indefinite answer gives 
no information on the value of the formula on the concrete model. In this case, 
a refinement is needed in order to make the abstract models more precise. 

The technique proposed here significantly extends the scope of existing 
automatic variability-specific abstraction refinement procedures [8,18], which 
currently support the verification of universal LTL properties only. They use 
conservative variability abstractions to construct over-approximated abstract 
variability models, which preserve LTL properties. If a spurious counterexample 
(introduced due to the abstraction) is found in the abstract model, the pro- 
cedures [8,18] use Craig interpolation to extract relevant information from it 
in order to define the refinement of abstract models. Variability abstractions 
that preserve all (universal and existential) CTL properties have been previ- 
ously introduced [12], but without an automatic mechanism for constructing 
them and no notion of refinement. The abstractions [12] has to be constructed 
manually by an engineer before verification. In order to make the entire verifi- 
cation procedure automatic, we need to develop an abstraction and refinement 
framework for CTL properties. 

In this work, we propose the first variability-specific abstraction refinement 
procedure for automatically verifying arbitrary formulae of CTL. To achieve this 
aim, model checking games [24-26] represent the most suitable framework for 
defining the refinement. In this way, we establish a brand new connection between 
games and family-based (SPL) model checking. The refinement is defined by 
finding the reason for the indefinite result of an algorithm that solves the corre- 
sponding model checking game, which is played by two players: Player V (trying 
to refute the formula ® on an abstract model M) and Player 3 (trying to verify 
® on M). The game is played on a game board, which consists of configurations 
of the form (s,®’) where s is a state of the abstract model M and @’ is a sub- 
formula of ®, such that the value of P’ in s is relevant for determining the final 
model checking result. The players make moves between configurations in which 
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they try to verify or refute P’ in s. All possible plays of a game are captured 
in the game-graph, whose nodes are the elements of the game board and whose 
edges are the possible moves of the players. The model checking game is solved 
via a coloring algorithm which colors each node (s, ®’) in the game-graph by T, 
F, or ? iff the value of @ in s is true, false, or indefinite, respectively. Player V 
has a winning strategy at the node (s, &) iff the node is colored by F iff # does 
not hold in s, and Player J has a winning strategy at (s, P’) iff the node is colored 
by T iff P holds in s. In addition, it is also possible that neither of players has 
a winning strategy, in which case the node is colored by ? and the value of & in 
s is indefinite. In this case, we want to refine the abstract model. We can find 
the reason for the tie by examining the game-graph. We choose a refinement 
criterion, which splits abstract configurations so that the new, refined abstract 
configurations represent smaller subsets of concrete configurations. 


2 Background 


Variability Models. Let F = {Aj,..., An} be a finite set of Boolean variables 

representing the features available in a variability model. A specific subset of 

features, k C F, known as configuration, specifies a variant (valid product) of a 

variability model. We assume that only a subset K C 2" of configurations are 

valid. An alternative representation of configurations is based upon propositional 

formulae. Each configuration k € K can be represented by a formula: k(Ai) A 
We use transition systems (TS) to describe behaviors of single-systems. 


Definition 1. A transition system (TS) is a tuple T = (S, Act, trans, I, AP, L), 

where S is a set of states; Act is a set of actions; trans C S x Act x S is a 

transition relation which is total, so that for each state there is an outgoing 

transition; I C S is a set of initial states; AP is a set of atomic propositions; 

and L: S — 24? is a labelling function specifying which propositions hold in a 
. à 

state. We write s;—+s82 whenever (s1, À, S2) € trans. 


An execution (behaviour) of a TS T is an infinite sequence p = 891512... 


with so € J such that s; 2u Si+1 for all i > 0. The semantics of the TS 7, 
denoted as [T] 7s, is the set of its executions. 

A featured transition system (FTS) is a particular instance of a variability 
model, which describes the behavior of a whole family of systems in a single 
monolithic description, where the transitions are guarded by a presence condition 
that identifies the variants they belong to. The presence conditions w are drawn 
from the set of feature expressions, FeatExp(F), which are propositional logic 
formulae over F: ¢:: = true | A € F | ay | Y1 Awe. We write [y] to denote the 
set of configurations from K that satisfy a, i.e. k € [y] iff k H y. 


Definition 2. A featured transition system (FTS) represents a tuple F = 
(S, Act, trans, I, AP, L,F,K, ô), where S, Act, trans, I, AP, and L form a TS; F 
is the set of available features; K is a set of valid configurations; and ô : trans —> 
FeatExp(F) is a total function decorating transitions with presence conditions. 
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The projection of an FTS F to a configuration k € K, denoted as 7;,(F), is the 
TS (S, Act, trans’, I, AP, L), where trans’ = {t € trans | k H 6(t)}. We lift 
the definition of projection to sets of configurations K’ CK, denoted as mx: (F), 
by keeping the transitions admitted by at least one of the configurations in R’. 
That is, ng (F), is the FTS (S, Act, trans’, I, AP, L,F,K’, 6’), where trans’ = 
{t € trans | dk € K’.k H d(t)} and 0’ = ôl ans is the restriction of 6 to trans’. 
The semantics of an FTS F, denoted as [F] rrs, is the union of behaviours of 
the projections on all valid variants k € K, i.e. [F] ers = Ukerlrk(F)]rs. 

Modal transition systems (MTSs) [22] are a generalization of transition sys- 
tems equipped with two transition relations: must and may. The former (must) 
is used to specify the required behavior, while the latter (may) to specify the 
allowed behavior of a system. We will use MTSs for representing abstractions of 
FTSs. 


Definition 3. A modal transition system (MTS) is represented by a tuple M = 
(S, Act, trans™%, trans™*, I, AP, L), where trans™’ C S x Act x S describe 
may transitions of M; trans™** C S x Act x S describe must transitions of M, 
such that trans™” is total and trans™* C trans™™. 


A may-ezecution in M is an execution (infinite sequence) with all its transitions 
in trans™; whereas a must-execution in M is a maximal sequence with all 
its transitions in trans™*t, which cannot be extended with any other transi- 
tion from trans™st, Note that since trans™* is not necessarily total, must- 
executions can be finite. We use [M] hry (resp., [M] 7s) to denote the set of 
all may-executions (resp., must-executions) in M starting in an initial state. 


Example 1. Throughout this paper, we will use a beverage vending machine as 
a running example [4]. Figure 1 shows the FTS of a VENDMACH family. It has 
two features, and each of them is assigned an identifying letter and a color. 
The features are: CancelPurchase (c, in brown), for canceling a purchase after 
a coin is entered; and FreeDrinks (f, in blue) for offering free drinks. Each 
transition is labeled by an action followed by a feature expression. For instance, 
the transition so BaL Sq is included in variants where the feature f is enabled. 
For clarity, we omit to write the presence condition true in transitions. There is 
only one atomic proposition served € AP, which is abbreviated as r. Note that 
r € L(s2), whereas r g L(so) and r ¢ L(s1). 

By combining various features, a number of variants of this VENDMACH can 
be obtained. The set of valid configurations is: KYM = {0, {c}, {f}, 1c, f}} (or, 
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equivalently KYM = {~c Anf, cAnf,acAf,cAf}). Figure 2 shows a basic version 
of VENDMACH that only serves a drink, described by the configuration: Ø (or, 
as formula ~c \—f). It takes a coin, serves a drink, opens a compartment so the 
customer can take the drink. Figure3 shows an MTS, where must transitions 
are denoted by solid lines, while may transitions by dashed lines. 


CTL Properties. We present Computation Tree Logic (CTL) [1] for specifying 
system properties. CTL state formulae ® are given by: 


@:: = true | false | 1 | B1 ABa | Bı V B2 | Ad | Ed, go: = OF | UB, | P1 VBa 


where l € Lit = APU {~-a | a € AP} and ¢ represent CTL path formulae. Note 
that the CTL state formulae ® are given in negation normal form (~ is applied 
only to atomic propositions). The path formula O@ can be read as “in the next 
state B”, PUS» can be read as “Pı until 2”, and its dual Pı Vz can be read 
as “By while not ©,” (where ®; may never hold). 

We assume the standard CTL semantics over TSs is given [1] (see also [16, 
Appendix A]). We write [7 = ®] = tt to denote that T satisfies the formula 9, 
whereas [7 | 8] = ff to denote that T does not satisfy & 

We say that an FTS F satisfies a CTL formula ®, written |F H 8] = tt, iff 
all its valid variants satisfy the formula, i.e. Vk EK. [nk(F) H 8] = tt. Otherwise, 
we say F does not satisfy ®, written |F | | = ff. In this case, we also want 
to determine a non-empty set of violating variants K’ C K, such that Vk’ € 
K’. [ny (F) = B| = ff and Vke R\R’. [nk (F) E | = tt. 

We define the 3-valued semantics of CTL over an MTS M slightly differently 
from the semantics for TSs. A CTL state formula @ is satisfied in a state s of 
an MTS M, denoted |M, s =° 4], iff (M is omitted when clear from context):' 


(l) [s Ea] = f if a € L(s) ae . if a g L(s) 


f. ifag L(s)’ ff, ifae L(s) 
tt, if [s H 81] = tt and [s =" 82] = tt 
3? Bi AD = 4 ff, if [s H 8] = ffor [s H 8) = ff 
L, otherwise 
tt, if Vo € [M] iss. [e H? 4] = tt 
(3) [s = Ao] = 4 ff ifdpe [Mle [o = o] = f 
L, otherwise 
tt, if 3p € [M] irs". ld = tt 
[s F Eg = 46 if Vp e MIRRE Ip E A= f 
L, otherwise 


(2) 


wH 


where [M] ERE (resp., [M] irrg) denotes the set of all may-executions (must- 
executions) starting in the state s of M. Satisfaction of a path formula ¢ for a 
may- or must-execution p = s9\1$1A2... of an MTS M (we write pi = s; to 


1 See [16, Appendix A] for definitions of [s H? 6: Vy], [o H? OF), and [p H? (61 V62)]. 
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denote the i-th state of p, and |p| to denote the number of states in p), denoted 
IM, p H? ¢], is defined as (M is omitted when clear from context): 


tt, if 30<i< |p]. (lp: H? D2] =tt A (Vj < ilo; H? G1] = tt)) 


VO<i<|pl.(Vj <i.[pj = 8 i H3S 

(4) lo 3 (6, U@2)] = ff, if <Si<lel ( A 1 [oz | 1] ff [e | 2] À 
A vi> 0.[p: H 1] 4 F = |p| = 00 

L, otherwise 


A MTS M satisfies a formula @, written [M_H? 8] = tt, iff Vso € T. [so H’ 
P| = tt. We say that [M —? 8] = ff if Iso € I. [so H’ D] = ff. Otherwise, 
[M = p] = L. 


Example 2. Consider the FTS VENDMACH and MTS a/?™(VENDMAcH) in 
Figs.1 and 3. The property 4 = A(-rUr) states that in the initial state 
along every execution will eventually reach the state where r holds. Note 
that [VENDMaAcH } ©] = ff. E.g., if the feature c is enabled, a counter- 
example where the state s2 that satisfies r is never reached is: sọ —> sı — 
so — .... The set of violating products is [c] = {{c}, {f,c}} C KYM However, 
[nj] VENDMAcH) j 8ı] = tt. We also have that [a!°"(VENDMacx) =? 8] = 
L, since (1) there is a may-execution in a/°™(VENDMACH) where s is never 
reached: so — sı —> so > ..., and (2) there is no must-execution that violates 84. 

Consider the property a = E(7rUr), which describes a situation where in 
the initial state there exists an execution that will eventually reach s2 that sat- 
isfies r. Note that [VENDMACH f} ®2] = tt, since even for variants with the fea- 
ture c there is a continuation from the state sı to s2. But, [a/°"(VENDMACcH) H 
Bə] = L since (1) there is no a must-execution in aJ°™(VENDMACH) that reaches 
S2 from so, and (2) there is a may-execution that satisfies Bo. 


3 Abstraction of FTSs 


We now introduce the variability abstractions [12] which preserve full CTL. We 
start working with Galois connections? between Boolean complete lattices of 
feature expressions, and then induce a notion of abstraction of FTSs. 

The Boolean complete lattice of feature expressions (propositional formulae 
over F) is: (FeatExp(F)/=,-,V,A, true, false, =). The elements of the domain 
FeatExp(F)/= are equivalence classes of propositional formulae 4% € FeatExp(F) 
obtained by quotienting by the semantic equivalence =. The ordering } is the 
standard entailment between propositional logics formulae, whereas the least 
upper bound and the greatest lower bound are just logical disjunction and con- 
junction respectively. Finally, the constant false is the least, true is the greatest 
element, and negation is the complement operator. 


2 (LD, <L) == (M, <m) is a Galois connection between complete lattices L (concrete 
domain) and M (abstract domain) iff a: L > M and y : M = L are total functions 
that satisfy: a(l) <<mm = > l <z y(m), for alll € L, m € M. 
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Over-approximating abstractions. The join abstraction, a}°™, replaces each 
feature expression 7 with true if there exists at least one configuration from 
K that satisfies 7. The abstract set of features is empty: aJ?™(F) = Ø, and 
abstract set of configurations is a singleton: a°! (K) = {true}. The abstraction 
and concretization functions between FeatExp(F) and FeatExp(Q) are: 


oin true if 3k € K.k H ioin true if w is true 
oy) = KETY 0in(y) = 
false otherwise 


Vreang k if Y is false 


which form a Galois connection [15]. In this way, we obtain a single abstract 
variant that includes all transitions occurring in any variant. 


~ 


Under-approximating abstractions. The dual join abstraction, aj™, 
replaces each feature expression w with true if all configurations from K sat- 
isfy w. The abstraction and concretization functions between FeatExp(F) and 


—— 


FeatExp(Q), forming a Galois connection [12], are defined as [9]: aio = ~ o 


ad o~ and yio = ~ o YI o~, that is: 


aiin (y) = true if Vk Ee Kk H y n(y)= Nre (7k) 4 w is true 

false otherwise false if w is false 
In this way, we obtain a single abstract variant that includes only those transi- 
tions that occur in all variants. 


Abstract MTS and Preservation of CTL. Given a Galois connection 
(am 0m) defined on the level of feature expressions, we now define the 
abstraction of an FTS as an MTS with two transition relations: one (may) pre- 
serving universal properties, and the other (must) preserving existential proper- 
ties. The may transitions describe the behaviour that is possible in some variant 
of the concrete FTS, but not need be realized in the other variants; whereas the 
must transitions describe behaviour that has to be present in all variants of the 
FTS. 


Definition 4. Given the FTS F = (S, Act, trans, I, AP, L,F, K, ô), define MTS 
a°™(F) = (S, Act, trans™”, trans™**, I, AP, L) to be its abstraction, where 
trans’ = {t € trans | as™(6(t)) = true}, and trans™* = {t € trans | 


aion ((t)) = true}. 


Note that the abstract model a/°™(F) has no variability in it, i.e. it contains 
only one abstract configuration. We now show that the 3-valued semantics of 
the MTS aj°"(F) is designed to be sound in the sense that it preserves both 
satisfaction (tt) and refutation (ff) of a formula from the abstract model to the 
concrete one. However, if the truth value of a formula in the abstract model is L, 
then its value over the concrete model is not known. We prove [16, Appendix B]: 
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Theorem 1 (Preservation results). For every P € CTL, we have: 


(1) [a (F) = |= tt [F Holi. 
(2) [a F) = 8| = ff [F = | = ff and [nk(F) = B| = ff for all 
keK. 


Divide-and-conquer strategy. The problem of evaluating |F |} ©] can be 
reduced to a number of smaller problems by partitioning the configuration space 
K. Let the subsets K1, K2,...,Kn form a partition of the set K. Then, |F — 
D| = tt iff [ng,(F) = B] = tt for all i = 1,...,n. Also, [F H 8] = ff iff 
[ng (F) H ©] = ff for some 1 < j < n. By using Theorem 1, we obtain the 
following result. 


Corollary 1. Let Kı, K2,...,Kn form a partition of K. 


If [ob (1K, (F)) H GO) =tt A... A [a (aK, (F)) H P= tt, then [F H 
P| = tt. 

(2) If [œ (ng, (F)) H |= for some 1<j < n, then [F H 6) = ff and 
[te(F) =| 8] = ff for all k € Kj. 


Example 3. Recall the FTS VENDMAcH of Fig. 1. Figure3 shows the MTS 
a°™(VENDMAcH), where the allowed (may) part of the behavior includes the 
transitions that are associated with the optional features c and f in VEND- 
MACH, and the required (must) part includes transitions with the presence 
condition true. Consider the properties introduced in Example 2. We have 
[ai (VENDMaACcH) H? 1] = L and [a/°?™(VENDMAcH) =? 2] = L, so we 
cannot conclude whether ©, and ®2 are satisfied by VENDMACH or not. 


4 Game-Based Abstract Lifted Model Checking 


The 3-valued model checking game [24,25] on an MTS M with state set S, a 
state s € S, and a CTL formula @ is played by Player V and Player 3 in order 
to evaluate ® in s of M. The goal of Player V is either to refute ® on M or 
to prevent Player 4 from verifying it. The goal of Player d is either to verify & 
on M or to prevent Player V from refuting it. The game board is the Cartesian 
product S x sub(®), where sub(®) is defined as: 


if d= true, false, l, then sub(&) ={P}; if 6= HO, then sub(&) ={H}Usub(G,) 
if P = Pı A B2, Bı V Do, then sub(&) = {8} U sub(@1) U sub(@2) 
if P = Æ(D1 U2), H(G, V2), then sub(&) = exp(P) U sub(@1) U sub(@2) 


where Æ ranges over both A and E. The expansion eap(®) is defined as: 


S = K (UL) : exp() = {8, B> V (1 A BO 8), 81 A EOF, EOS} 
P = Æ(S V8) : exp(®) = {6,2 A (1 V EO 8), dı V BOG, Æ OD} 


A single play from (s,®) is a possibly infinite sequence of configurations 
Co pp C1 =p, C2 >p ---, where Co = (s,®), Ci € S x sub(®), and p; € 
{Player V, Player 3}. The subformula in C; determines which player p; makes 
the next move. The possible moves at each configuration are: 
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(1) Ci = (s, false), Ci = (s, true), Ci = (s,1): the play is finished. Such configu- 
rations are called terminal. 
(2) if C; = (s, AO 8), Player V chooses a must-transition s — s’ (for refutation) 
or a may-transition s — s’ of M (to prevent satisfaction), and C41 = (s’,®). 
(3) if C; = (s, EO®), Player 3 chooses a must-transition s — s’ (for satisfaction) 
or a may-transition s — s’ of M (to prevent refutation), and Cj. = (s’,®). 
(4) if Ci = (s, 81 A &2), then Player V chooses j € {1,2} and Ci+1 = (s, 8;). 
(5) if Ci = (s, 81 V &2), then Player 3 chooses j € {1,2} and Ci+1 = (s, 8;). 
(6), (7) if Ci = (s, H(®, Ud>)), then Cy = (s, Bo Vv (BD; A O Æ(®ı U®2))). 
(8), (9) if Ci = (s, Æ(B1V82)), then Ci = (s, B2 A (D1 V © Ji(®,VE_))). 


The moves (6)—(9) are deterministic, thus any player can make them. 

A play is a mazimal play iff it is infinite or ends in a terminal configuration. 
A play is infinite [26] iff there is exactly one subformula of the form AU, AV, 
EU, or EV that occurs infinitely often in the play. Such a subformula is called a 
witness. We have the following winning criteria: 


— Player V wins a (maximal) play iff in each configuration of the form C; = 
(s, AO 8), Player V chooses a move based on must-transitions and one of the 
following holds: (1) the play is finite and ends in a terminal configuration of 
the form C; = (s, false) or C; = (s,a) where a ¢ L(s) or Ci = (s,7a) where 
a € L(s); (2) the play is infinite and the witness is of the form AU or EU. 

— Player J wins a (maximal) play iff in each configuration of the form C; = 
(s, EC ®), Player J chooses a move based on must-transitions and one of the 
following holds: (1) the play is finite and ends in a terminal configuration of 
the form C; = (s, true) or C; = (s,a) where a E€ L(s) or Ci = (s,7a) where 
a ¢ L(s); (2) the play is infinite and the witness is of the form AV or EV. 

— Otherwise, the play ends in a tie. 


A strategy is a set of rules for a player, telling the player which move to 
choose in the current configuration. A winning strategy from (s,®) is a set of 
rules allowing the player to win every play that starts at (s,®) if he plays by 
the rules. It was shown in [24,25] that the model checking problem of evaluating 
[M, s =? B] can be reduced to the problem of finding which player has a winning 
strategy from (s,®) (i.e. to solving the given 3-valued model checking game). 

The algorithm proposed in [24,25] for solving the given 3-valued model check- 
ing game consists of two parts. First, it constructs a game-graph, then it runs an 
algorithm for coloring the game-graph. The game-graph is Gyjyo = (N, E) 
where N C S x sub(®) is the set of nodes and Æ C N x N is the set of 
edges. N contains a node for each configuration that was reached during the 
construction of the game-graph that starts from initial configurations I x {8} 
in a BFS manner, and E contains an edge for each possible move that was 
applied. The nodes of the game-graph can be classified as: terminal nodes, ^- 
nodes, V-nodes, AC)-nodes, and E©-nodes. Similarly, the edges can be classified 
as: progress edges, which originate in AQ) or ECO nodes and reflect real transi- 
tions of the MTS M, and auxiliary nodes, which are all other edges. We distin- 
guish two types of progress edges, two types of children, and two types of SCCs 
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(Strongly Connected Components). Must-edges (may-edges) are edges based on 
must-transitions (may-transitions) of MTSs. A node n’ is a must-child (may- 
child) of the node n if there exists a must-edge (may-edge) (n,n’). A must-SCC 
(may-SCC) is an SCC in which all progress edges are must-edges (may-edges). 
The game-graph is partitioned into its may-Maximal SCCs (may-MSCCs), 
denoted Q;’s. This partition induces a partial order < on the Q,’s, such that 
edges go out of a set Q; only to itself or to a smaller set Q;. The partial order 
is extended to a total order < arbitrarily. The coloring algorithm processes the 
Q,’s according to <, bottom-up. Let Q; be the smallest set that is not fully 
colored. The nodes of Q; are colored in two phases, as follows. 
Phase 1. Apply these rules to all nodes in Q; until none of them is applicable. 


— A terminal node C is colored: by T if Player 3 wins in it (when C = (s, true) 
or C = (s,a) with a € L(s) or C = (s, ~a) with a ¢ L(s)); and by F if Player 
Y wins in it (when C = (s, false) or C = (s,a) with a ¢ L(s) or C = (s, ~a) 
with a € L(s)). 

— An AO node is colored: by T if all its may-children are colored by T; by F if 
it has a must-child colored by F; by ? if all its must-children are colored by 
T or ?, and it has a may-child colored by F or ?. 

— An EO node is colored: by T if it has a must-child colored by T; by F if all 
its may-children are colored by F; by ? if it has a may-child colored by T or 
?, and all its must-children are colored by F or ?. 

— An A-node (V-node) is colored: by T (F) if both its children are colored by T 
(F); by F (T) if it has a child that is colored by F (T); by ? if it has a child 
colored by ? and the other child is colored by ? or T (F). 


Phase 2. If after propagation of the rules of Phase 1, there are still nodes in 
Qi that remain uncolored, then Q; must be a non-trivial may-MSCC that has 
exactly one witness. We consider two cases. 


Case U. The witness is of the form A(®,U2) or E(®, U2). 
Phase 2a. Repeatedly color by ? each node in Q; that satisfies one of the 
following conditions, until there is no change: 
(1) An AQ node that all its must-children are colored by T or ?; (2) An EO 
node that has a may-child colored by T or ?; (3) An A node that both its 
children are colored T or ?; (4) An V node that has a child colored by T or ?. 
In fact, each node for which the F option is no longer possible according to 
the rules of Phase 1 is colored by ?. 
Phase 2b. Color the remaining nodes in Q; by F. 

Case V. The witness is of the form A(®,V®2) or E(@,V@2) (see [16, 
Appendix B)}). 


The result of the coloring is a 3-valued coloring function x: N —> {T,F,?}. 


Theorem 2 ((24]). For each n = (s,®') E€ Guxe: 


(1) (M, s) H ®] = tt iff x(n) =T iff Player 3 has a winning strategy at n. 
(2) (M, s) = ©] = ff iff x(n) = F iff Player Y has a winning strategy at n. 
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Fig. 4. The colored game-graph for a°™(VENDMACH) and ı = A(-rUr). (Color 
figure online) 


(s1, A O A(>rUr)) 


failure node 


(3) (M, s) = P'] =L iff x(n) =? iff none of players has a winning strategy 
atn. 


Using Theorems 1 and 2, given the colored game-graph of the MTS ai! (F), if 
all its initial nodes are colored by T then [F | 9%] = tt, if at least one of them is 
colored by F then [F | 8] = ff. Otherwise, we do not know. 


Example 4. The colored game-graph for the MTS a/°?"(VENDMACH) and ®; = 
A(arUr) is shown in Fig. 4. Green, red (with dashed borders), and white nodes 
denote nodes colored by T, F, and ?, respectively. The partitions from Qı to Qe 
consist of a single node shown in Fig. 4, while Q7 contains all the other nodes. 
The initial node (so, 1) is colored by ?, so we obtain an indefinite answer. 


5 Incremental Refinement Framework 


Given an FTS mg (F) with a configuration set K’ C K, we show how to exploit 
the game-graph of the abstract MTS M = a!°((F)) in order to do refine- 
ment in case that the model checking resulted in an indefinite answer. The 
refinement consists of two parts. First, we use the information gained by the 
coloring algorithm of Gmxg in order to split the single abstract configuration 
true € as?" (K’) that represents the whole concrete configuration set K’. We then 
construct the refined abstract models, using the refined abstract configurations. 
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Algorithm. Verify (F, K, &) 


1 Check by game-based model checking algorithm [a!°"(F) =? $]? 

2 If the result is tt, then return that ® is satisfied for all variants in K. If the result 
is ff, then return that ® is violated for all variants in K. 

3 Otherwise, an indefinite result is returned. Let the may-edge from n = (s, 81) to 
n’ = (s', B1) be the reason for failure, and let y be the feature expression guarding 

the transition from s to s’ in F. We generate Fy = mpyj(F) and Fo = muy (F), 

and call Verify (F1, KN [#],®) and Verify(F2,KN [-y],®). 


Fig. 5. The refinement procedure that checks [F — 4]. 


There are a failure node and a failure reason associated with an indefinite 
answer. The goal in the refinement is to find and eliminate at least one of the 
failure reasons. 


Definition 5. A node n is a failure node if it is colored by ?, whereas none of 
its children was colored by ? at the time n got colored by the coloring algorithm. 


Such failure node can be seen as the point where the loss of information occurred, 
so we can use it in the refinement step to change the final model checking result. 


Lemma 1 ([24]). A failure node is one of the following. 


- An AO-node (EC-node) that has a may-child colored by F (T). 
- An ACQ-node (ECO-node) that was colored during Phase 2a based on an AU 
(AV) witness, and has a may-child colored by ?. 


Given a failure node n = (s,®), suppose that its may-child is n’ = (s', 81) 
as identified in Lemma 1. Then the may-edge from n to n’ is considered as 
the failure reason. Since the failure reason is a may-transition in the abstract 
MTS a!°™ (zx: (F)), it needs to be refined in order to result either in a must 
transition or no transition at all. Let s°’,s" be the transition in the concrete 
model mK: (F) corresponding to the above (failure) may-transition. We split the 
configuration space K’ into [y] and [~y] subsets, and we partition mg (F) in 
T[wjnk (F) and Thyn (F). Then, we repeat the verification process based on 
abstract models a!°'"(tpyjnx:(F)) and a! (niyn (F)). Note that, in the 
former, a?" (niyn (F)), s>s becomes a must-transition, while in the lat- 
ter, a! (miyn (F)), ss! is removed. The complete refinement procedure is 
shown in Fig. 5. We prove that (see [16, Appendix A]): 


Theorem 3. The procedure Verify (F,K,®) terminates and is correct. 


Example 5. We can do a failure analysis on the game-graph of a!°™(VENDMACH) 
in Fig.4. The failure node is (s;,A © A(-rUr)) and the reason is the may- 
edge (s1, A O A(arUr)) 22 (so, A(arUr)). The corresponding concrete transi- 
tion in VENDMACH is s1 £24, so. So, we partition the configuration space KYM 
into subsets |c] and [>c], and in the next second iteration we consider FTSs 
Tjej VENDMACH) and 7]5.](VENDMACH). 
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Fig. 6. Gaioin (mp4 (VENDMacH)) x ® 1° Fig. 7. qin (Tie (VeENDMacn)) 


The game-based model checking algorithm provides us with a convenient 
framework to use results from previous iterations and avoid unnecessary calcu- 
lations. At the end of the i-th iteration of abstraction-refinement, we remember 
those nodes that were colored by definite colors. Let D denote the set of such 
nodes. Let xp : D — {T, F} be the coloring function that maps each node in 
D to its definite color. The incremental approach uses this information both in 
the construction of the game-graph and its coloring. During the construction of 
a new refined game-graph performed in a BFS manner in the next i + 1-th iter- 
ation, we prune the game-graph in nodes that are from D. When a node n € D 
is encountered, we add n to the game-graph and do not continue to construct 
the game-graph from n onwards. That is, n € D is considered as terminal node 
and colored by its previous color. As a result of this pruning, only the reachable 
sub-graph that was previously colored by ? is refined. 


Example 6. The property ©, holds for mp,.j(VENDMACH). The initial node 
of the game-graph Gaioin (rj (VenDMacn))x 81 (see [16, Fig. 13, Appendix C]), 
is colored by T. On the other hand, we obtain an indefinite answer for 
™-(VENDMACH). The model a/°"(7,j3(VENDMAcH)) is shown in Fig.7, 
whereas the final colored game-graph Gajoin (myey (VENDMacu)) xO} is given in 
Fig. 6. The failure node is (sọ, A © A(=rUr)), and the reason is the may-edge 
(89, A O A(arUr)) 2 (s1, A(~rUr)). The corresponding concrete transition in 
Tq] ( VENDMACH) is so pa 51. So, in the next third iteration we consider FTSs 
T[caaf](VENDMACH) and Tjena ¢]( VENDMACH). 

The initial node of the graph Gaioin(m).,_.))(VenpMacn))x# (see [16, 
Fig. 16, Appendix C]) is colored by F in Phase 2b. The initial node of 
Gaioin (arp... fj VENDMACH)) X B1 (see [16, Fig. 17, Appendix C]) is colored by T. 
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In the end, we conclude that © is satisfied by the variants {ac A =f, =c A 
f,cA f}, and @ is violated by the variant {c A +f}. 

On the other hand, we need two iterations to conclude that 2 = E(>rUr) 
is satisfied by all variants in KY™ (see [16, Appendix D] for details). 


6 Evaluation 


To evaluate our approach, we use a synthetic example to demonstrate spe- 
cific characteristics of our approach, and the ELEVATOR model which is often 
used as benchmark in SPL community [4,12,15,20,23]. We compare (1) our 
abstraction-refinement procedure Verify with the game-based model checking 
algorithm implemented in Java from scratch vs. (2) family-based version of 
the NUSMVmodel checker, denoted fNUSMV, which implements the standard 
lifted model checking algorithm [5]. For each experiment, we measure T(IME) 
to perform an analysis task, and CALL which is the number of times an app- 
roach calls the model checking engine. All experiments were executed on a 64- 
bit Intel®Core™ i5-3337U CPU running at 1.80 GHz with 8 GB memory. All 
experimental data is available from: https: //aleksdimovski.github.io/automatic- 
ctl.html. 


Synthetic example. The FTS Mn (where n > 0) consists of n features A;,..., An 
and an integer data variable x, such that the set AP consists of all evaluations 
of x which assign nonnegative integer values to x. The set of valid configurations 
is K, = 2t41--4n}, M, has a tree-like structure, where in the root is the initial 
state with x = 0. In each level k (k > 1), there are two states that can be reached 
with two transitions leading from a state from a previous level. One transition 
is allowable for variants with the feature A; enabled, so that in the target state 
the variable’s value is x + 2"—! where z is its value in the source state, whereas 
the other transition is allowable for variants with A; disabled, so that the value 
of x does not change. For example, Mə is shown in Fig.8, where in each state 
we show the current value of x and all transitions have the silent action T. 

We consider two properties: P = A(trueU(x > 0)) and P = A(trueU(a > 
1)). The property ® is satisfied by all variants in K, whereas @’ is violated 
only by one configuration ~A;/A...\7A, (where all features are disabled). We 
have verified M, against ® and P using fNUSMV (e.g. see fNUSMV models for 
Mı and Mə in [16, Fig. 23, Appendix E]). We have also checked M,, using our 
Verify procedure. For @, Verify terminates in one iteration since a°? ( Mp) 
satisfies (see Ggioin(as,)x@ in [16, Fig. 24, Appendix E]). For P', Verify needs 
n + 1 iterations. First, an indefinite result is reported for aJ°"(M,,) (e.g. see 
Gaioin(M,)xo in [16, Fig.27, Appendix E]), and the configuration space is split 
into [>A] and [Aj] subsets. The refinement procedure proceeds in this way 
until we obtain definite results for all variants. The performance results are 
shown in Fig.9. Notice that, fNUSMV reports all results in only one iteration. 
As n grows, Verify becomes faster than fNUSMV. For n = 11 (|K| = 215), 
fNUSMV timeouts after 2h. In contrast, Verify is feasible even for large values 
of n. 
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ELEVATOR. We have experimented with the ELEVATOR model with four floors, 
designed by Plath and Ryan [23]. It contains about 300 LOC of fNUSMV code 
and 9 independent optional features that modify the basic behaviour of the 
elevator, thus yielding 2° = 512 variants. To use our Verify procedure, we have 
manually translated the fNUSMV model into an FTS and then we have called 
Verify on it. The basic ELEVATOR system consists of a single lift that travels 
between four floors. There are four platform buttons and a single lift, which 
declares variables floor, door, direction, and a further four cabin buttons. When 
serving a floor, the lift door opens and closes again. We consider three properties 
“Bı = E(ttU(floor=1 A idle ^ door =closed))” , “Pa = A(ttU(floor=1 A idle A 
door =closed))”, and “Pz = E(ttU(( floor =3 A ali ftBut3.pressed A direction= 
up) = door =closed))”. The performance results are shown in Fig. 10. The 
properties ®; and @ are satisfied by all variants, so Verify achieves speed-ups 
of 28 times for ı and 2.7 times for ə compared to the fNUSMV approach. 
fNUSMV takes 1.76 sec to check 3, whereas Verify ends in 0.67 sec thus giving 
2.6 times performance speed-up. 


7 Related Work and Conclusion 


There are different formalisms for representing variability models [2,21]. Classen 
et al. [4] present Featured Transition Systems (FTSs). They show how specifically 
designed lifted model checking algorithms [5,7] can be used for verifying FTSs 
against LTL and CTL properties. The variability abstractions that preserve LTL 
are introduced in [14, 15,17], and subsequently automatic abstraction refinement 
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procedures [8,18] for lifted model checking of LTL are proposed, by using Craig 
interpolation to define the refinement. The variability abstractions that preserve 
the full CTL are introduced in [12], but they are constructed manually and 
no notion of refinement is defined there. In this paper, we define an automatic 
abstraction refinement procedure for lifted model checking of full CTL by using 
games to define the refinement. To the best of our knowledge, this is the first 
such procedure in lifted model checking. 

One of the earliest attempts for using games for CTL model checking has been 
proposed by Stirling [26]. Shoham and Grumberg [8, 19,24, 25] have extended this 
game-based approach for CTL over 3-valued semantics. In this work, we exploit 
and apply the game-based approach in a completely new direction, for automatic 
CTL verification of variability models. 

The works [11,13] present an approach for software lifted model checking of 
#ifdef-based program families using symbolic game semantics models [10]. 

To conclude, in this work we present a game-based lifted model checking for 
abstract variability models with respect to the full CTL. We also suggest an 
automatic refinement procedure, in case the model checking result is indefinite. 
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Abstract. Modeling and analysis of timing constraints is crucial in real- 
time automotive systems. Modern vehicles are interconnected through 
wireless networks which creates vulnerabilities to external malicious 
attacks. Violations of cyber-security can cause safety related accidents 
and serious damages. To identify the potential impacts of security related 
threats on safety properties of interconnected automotive systems, this 
paper presents analysis techniques that support verification and valida- 
tion (V&V) of safety & security (S/S) related timing constraints on those 
systems: Probabilistic extension of S/S timing constraints are specified 
in PrCcsL (probabilistic extension of clock constraint specification lan- 
guage) and the semantics of the extended constraints are translated into 
verifiable UPPAAL models with stochastic semantics for formal verifica- 
tion. A set of mapping rules are proposed to facilitate the translation. An 
automatic translation tool, namely ProTL, is implemented based on the 
mapping rules. Formal verification are performed on the S/S timing con- 
straints using UPPAAL-SMC under different attack scenarios. Our app- 
roach is demonstrated on a cooperative automotive system case study. 


Keywords: Automotive system - Safety and security - PrCcsL - 
UPPAAL-SMC 


1 Introduction 


Model based development (MBD) is rigorously applied in automotive systems in 
which the software controllers interact with physical environments. The contin- 
uous time behaviors of those systems often rely on complex dynamics as well as 
on stochastic behaviors. Formal verification and validation (V&V) technologies 
are indispensable and highly recommended for development of safe and reliable 
automotive systems [11,12]. Conventional V&V, i.e., testing and model checking 
have limitations in terms of assessing the reliability of hybrid systems due to both 
stochastic and non-linear dynamical features. To ensure the reliability of safety 
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critical hybrid dynamic systems, statistical model checking (SMC) techniques 
have been proposed [7,8,19]. These techniques for fully stochastic models vali- 
date probabilistic performance properties of given deterministic (or stochastic) 
controllers in given stochastic environments. 

Modern vehicles are being equipped with communication devices and inter- 
connected with each other through wireless networks. Vehicular Ad Hoc Net- 
works (VANET) [28] are the technologies of wireless networks that establish com- 
munication among vehicles and roadside units (RSU). Nevertheless vehicular 
communication contributes to the safety and efficiency of traffic, it introduces 
vulnerabilities to vehicles. Transmitted information can be corrupted or modified 
by attackers, resulting in serious safety consequences (e.g., rear-end collision). 
Analysis of the potential impacts of cyber-security violations on safety proper- 
ties is crucial in automotive systems. However, traditional automotive system 
design often addresses the correctness of safety properties without consideration 
of security breaches. There is still a lack of techniques that enable an integrated 
analysis of safety & security (S/S) properties. Moreover, message transmission 
in VANET that pertains to S/S requires restrictions by time deadlines [10]. In 
this paper, we focus on S/S related timing constraints and propose analysis tech- 
niques that support formal verification on interconnected automotive systems. 

EAST-ADL [9,22] is an architectural description language for modeling of 
automotive systems. The latest release of EAST-ADL has adopted the time 
model proposed in Timing Augmented Description Language (TADL2) [5], which 
expresses and composes basic timing constraints, i.e., repetition rates, end-to-end 
delays. TADL2 specializes the time model of MARTE, the UML profile for Mod- 
eling and Analysis of Real-Time and Embedded systems [30]. MARTE provides 
CcsL, a Clock Constraint Specification Language, that supports specification of 
both logical and dense timing constraints, as well as functional causality con- 
straints [16,23]. A probabilistic extension of CcsL, called PrCcst [14], has been 
proposed to formally specify timing constraints associated with stochastic prop- 
erties in weakly-hard real-time systems [4], i.e., a bounded number of constraints 
violations would not lead to system failures when the results of the violations 
are negligible. 

In this paper, we present a formal analysis of S/S related timing constraints 
for interconnected automotive systems at the design level: 1. To identify vulner- 
abilities of automotive systems under malicious attacks, we adopt and modify 
the behavioral model of a cooperative automotive system (CAS) [13] in UPPAAL- 
SMC by adding it with the models of an RSU-aided (RAISE) communication 
protocol in VANET and malicious attacks. The modification results in a refined 
behavioral model of the system, i.e., more details in terms of vehicular commu- 
nication and security breaches are depicted; 2. Probabilistic extension of S/S 
timing constraints are specified in PrCcsL and the semantics of the extended 
constraints are translated into verifiable models with stochastic semantics for 
formal verification; 3. A set of mapping rules are proposed to facilitate the 
translation, based on which an automatic translation tool ProTL is implemented; 
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4. Formal verification is performed on the S/S timing constraints using UPPAAL- 
SMC under different attack scenarios. 

The paper is organized as follows: Sect. 2 presents an overview of PrCcCsL and 
UPPAAL-SMC. CAS is introduced as a running example in Sect.3. Section 4.1 
presents the UPPAAL-SMC model of CAS complemented with model of RAISE 
protocol and three types of attacks. S/S related timing constraints are specified 
in PrCcst and translated into verifiable UPPAAL-SMC models in Sect.5. The 
applicability of our approach is demonstrated by performing verification on CAS 
case study in Sect. 6. Sections 7 and 8 present related works and conclusion. 


2 Preliminary 


In our framework, S/S related timing constraints are specified in PrCcst. 
UPPAAL-SMC is employed to perform formal verification on the timing 
constraints. 


2.1 Probabilistic Extension of Clock Constraint Specification 
Language (PrCCSL) 


PrCcsu [14] is a probabilistic extension of CcsL [3,23] for formal specification 
of timing constraints associated with stochastic behaviors. In PrCcst, a clock 
represents a sequence of (possibly infinite) instants. An event is a clock and 
the occurrences of an event correspond to a set of ticks of the clock. PrCcsL 
provides two types of clock constraints, i.e., expressions and relations, to specify 
the progression/occurrences of clocks. An expression derives new clocks from the 
already defined clocks [3]. Let cl,c2 € C, ITE (if-then-else) expression, denoted 
as 8 ? cl : c2, defines a new clock that behaves either as cl or as c2 according 
to the value of the boolean variable/formula 8. DelayFor (denoted ref (d) ~> 
base) results in a new clock by delaying the reference clock ref for d ticks (or d 
time units) of a base clock. FilterBy (c = base Y u(v)) builds a new clock c by 
filtering the instants of a base clock according to a binary word w=u(v), where 
u is the prefiz and v is the period. “(v)” denotes the infinite repetition of v. This 
expression results in a clock c that Vk € Nt, if the kt? bit in w is 1, then at 
the kt” tick of base, c ticks. 

A relation limits the occurrences among different events, which are defined 
based on run and history. A run corresponds to an execution of the system 
model where the clocks tick/progress. The history of a clock c represents the 
number of times the clock c has ticked prior to the current step. 


Definition 1 (Run). A run R consists of a finite set of consecutive steps where 
a set of clocks tick at each step i. The set of clocks ticking at step i is denoted as 
R(t), i.e., for alli, OS i <S n, R(i) € R, where n is the number of steps of R. 


Definition 2 (History). The history of clock c in a run R is a function: Hf: 
N > N. H(i) indicates the number of times the clock c has ticked prior to step 
i in run R, which is initialized as 0 at step 0. It is defined as: (1) HR(0) = 0; 
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(2VieENt, cg Ri) = Hgli+1)= AE(d); ()ViENT, ce Ri) => 
Hp(i+1) = Ap(t) + 1. 

A probabilistic relation in PrCCst is satisfied if and only if the probability of 
the relation constraint being satisfied is greater than or equal to the probability 


threshold p € [0, 1]. Given k runs = {R1,..., Rk}, the probabilistic subclock, 
coincidence, exclusion and precedence in PrCcsL are defined as follows: 


Probabilistic Subclock: clC,c2 <= Pr[clCc2] > p, where Pr[clCc2] = 


k 
i {Rj H cl1Cc2}, representing the ratio of runs that satisfies the relation 
j=l 


out of k runs. A run R; satisfies the subclock relation between cl and c2 “if 
cl ticks, c2 must tick” holds at every step i in Rj, s.t., (Rj H| clCc2) 4> (Vi 
O<i<n, cle R(t) = ce R(i)). “R; H clCc2” returns 1 if Rj satisfies 
clCc2, otherwise it returns 0. 


Probabilistic Coincidence: cl=,c2 Pr{cl=c2] > p, where Pr[cl=c2] = 


k 
ŁY {R; H cl=c2}, which represents the ratio of runs that satisfies the 
j=l 


coincidence relation out of k runs. A run, Rj satisfies the coincidence relation on 
cl and c2 if the assertion holds: Vi, 0 < i < n, (c1 € R(t) == QE R(i))A^ (QE 
R(i) = > cl € R(i)). In other words, the satisfaction of coincidence relation is 
established when the two conditions “if cl ticks, c2 must tick” and “if c2 ticks, 
cl must tick” hold at every step. 


Probabilistic Exclusion: cl##,c2 <= > Pr[cl#c2] > p, where Pr[cl#c2] = 


k 
it {Rj H cl##c2}, indicating the ratio of runs that satisfies the exclusion 
j=l 


relation out of k runs. A run, Rj, satisfies the exclusion relation on cl and c2 if 
Vi,0<ic<n (cd € RG) = c2 ¢ R(i))^ (QE Ri) =— al ¢ R(i)), i.e., for 
every step, if cl ticks, c2 must not tick and vice versa. 


Probabilistic Precedence: cl<,c2 <= Pr[cl<c2] > p, where Pr[cl<c2] = 


k 
i {Rj H cl<c2}, which denotes the ratio of runs that satisfies the precedence 
j=l 


relation out of k runs. A run Rj satisfies the precedence relation if the condition 
Vi, 0 <i < n, (HẸ (i) > HZ) and (H(i) = HAW) — (2 ¢ R(i)) hold, 
i.e., the history of cl is greater than or equal to the history of c2, and c2 must 
not tick when the history of the two clocks are equal. 


2.2 UPPAAL-SMC 


UPPAAL-SMC [31] performs the probabilistic analysis of properties by monitor- 
ing simulations of the complex hybrid system in a given stochastic environment 
and using results from the statistics to determine whether the system satisfies 
the property with some degree of confidence. UPPAAL-SMC provides a number 
of queries related to the stochastic interpretation of Timed Automata (STA) 
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[8] and they are as follows, where N and bound indicate the number of sim- 
ulations to be performed and the time bound on the simulations respectively: 
1. Probability Estimation estimates the probability of a requirement property 
@ being satisfied for a given STA model within the time bound: Pr[bound] ¢; 
2. Hypothesis Testing checks if the probability of ¢ is satisfied within a certain 
probability Po: Pr[bound] ¢ > Po; 3. Simulations: UPPAAL-SMC runs multiple 
simulations on the STA model and the k (state-based) properties/expressions 
$1, -Qk are monitored and visualized along the simulations: simulate N [< 


bound] {¢1, ..., dx}. 


3 Running Example 


A cooperative automotive system (CAS) [13] is adopted to illustrate our 
approaches. CAS includes distributed and coordinated sensing, control, and actu- 
ation over three vehicles (denoted as v;, where i € {0,1,2}) which are running 
in the same lane. As shown in Fig. 1, a lead vehicle (vj) runs automatically by 
recognizing traffic signs on the road. The following vehicle must set its desired 
velocity identical to that of its immediate preceding vehicle. Vehicles should 
maintain sufficient braking distance to avoid rear-end collision while remaining 
close enough to guarantee communication quality. Vehicle movement relies on 
availability of environmental information, e.g., traffic signs, obstacles, etc. The 
position of v; is represented by Cartesian coordinate (x;,y;), where x; and y; are 
distances measured from the vehicle to the two fixed perpendicular lines, i.e., 
x-axis and y-axis, respectively. 


Safety Distance 


Follower Follower 


AEA ((¢ Sa 


j v2 ` Im V1 , ( j , ! 
Ql) Ql) Ql) Traffic Sign Recognition 


Fig. 1. Overview of Cooperative Automotive System 


The cooperative driving of CAS requires prompt and secure information 
transmission among vehicles. We adopt a roadside unit aided (RAISE) [33] com- 
munication protocol in VANET to achieve the data transmission. Each vehicle 
periodically broadcasts its own position and velocity to its immediate following 
vehicle through wireless connection. The authentication of the identities of each 
vehicle and verification of messages sent by the vehicles is performed by RSU. 
For further details of RAISE, refer to Sect. 4.1. The following S/S properties on 
CAS are considered: 

R1. The follower vehicle should not overtake its leading vehicle when the vehicles 
run at a positive direction of x-axis. 

R2. When the lead vehicle detects a stop sign, all the three vehicles must stop 
within a given time, e.g., 2000 ms. 
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R3. If the distance between a vehicle and its preceding vehicle is less than mini- 
mum safety distance, the vehicle should decelerate within a certain time (200 ms). 
R4. If the distance between a vehicle and its preceding vehicle is greater than 
the maximum safety distance (e.g., 100m), the vehicle should accelerate within 
a certain time, e.g., 300 ms. 

R5. When the lead vehicle starts to turn left (or turn right), the two follower 
vehicles should finish turning and run in the same lane within a given time. 
R6. Authenticity: If a vehicle receives a message, its preceding vehicle must have 
sent a corresponding message before, i.e., the protocol should be resistant to 
message spoofing attack. 

R7. Secrecy: Symmetric keys of vehicles should be kept confidential to attackers. 
R8. Integrity: The content of messages must not be modified during transmission, 
i.e., the protocol should be resistant to message falsification attack. 

R9. Freshness: The vehicles should not accept an “obsolete” message, namely, the 
difference between the current time and the timestamp of the accepted message 
should be less than the predefined time threshold. 

R10. The symmetric key agreement (i.e., mutual authentication) process between 
RSU and three vehicles should be completed within a certain time, e.g., 600 ms. 
R11. A vehicle should send messages to its subsequent vehicle periodically with 
a period 200 ms and a jitter 100 ms. 

Among the above S/S requirements, R1-R5 are safety [20] properties, which 
specify that the system should not cause undesirable results on its environment 
and aim at protecting human lives, health and assets from being damaged. R6- 
R11 are security properties, which refer to the inability of the environment to 
affect the system in an undesirable way and aim to guarantee the confidential- 
ity and integrity of transmitted information. The interdependencies among those 
S/S properties are conditional dependencies [17], i.e., violations of security prop- 
erties can lead to the violations on safety properties. The events associated with 
those S/S properties can be interpreted as logical clocks in PrCcsL, which pro- 
vides a way to express S/S properties in the logical time manner [16]. Therefore, 
S/S properties can be interpreted as logical timing constraints, i.e., the temporal 
and causality clock relations in PrCcsL. 

The methodology for analysis of S/S related timing constraints in this paper 
can be generalized in Fig. 2. First, on the basis of the existing behavioral model 
of CAS described in [13], we enhance the CAS model by augmenting (paral- 
lelly composing) it with models of RAISE protocol and malicious attacks, result- 
ing in a refined CAS model regarding vehicular communication characteristics 
and security-related adversary interference. Second, we specify S/S timing con- 
straints (R1-R11) in PrCcst and translate the PrCcst specifications into corre- 
sponding STA and probabilistic queries. Finally, we combine the model of CAS 
and the STA of PrCcst specifications, and perform formal verification based on 
the combined model using UPPAAL-SMC. 
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Fig. 2. Methodology for analysis of S/S timing constraints 


4 Modeling and Refinement of CAS in UPPAAL-SMC 


The behaviors of CAS are modeled as a network of stochastic timed automata 
(NSTA) in UPPAAL-SMC described in [13]. In this section, we refine the CAS 
model by adding it with the models of RAISE protocol and security attacks. 


4.1 Modeling of RAISE Protocol in UPPAAL-SMC 


We present a simplified version of RAISE protocol [33] and its UPPAAL-SMC 
model. The original RAISE protocol is modified to facilitate the communica- 
tion mechanism of CAS, i.e., each follower vehicle receives messages from its 
immediate preceding vehicle and RSU. Furthermore, timing constraints are also 
appended to restrict the time duration of each step (e.g., encryption and decryp- 
tion) during communication process. There are two phases in RAISE protocol, 
i.e., symmetric key agreement and information transmission. 

1. Symmetric key agreement (SKA) is performed to obtain symmetric key k; 
for guaranteeing security of communication and generates pseudo identities I D; 
of vehicles for covering their real identities. The shared symmetric key between 
RSU and v; is k; = g%, where g, a, b are three positive random numbers. As 
shown in Fig. 3, Encry(msg, k) (Decry(msg, k)) denotes the encryption (decryp- 
tion) of message msg with key k, where k can be either a public key or symmetric 
key. Sign(msg, k) generates signature of msg with a private key k. We use PK; 
to denote the public key of v; and SK; to represent the corresponding private 
key. “||” is the concatenation operation on messages. 

Initially, v; randomly picks g and a (step 1), encrypts “g||a” and sends the 
encrypted result (m;) to RSU (step 2). Upon receiving m;, RSU decrypts the 
message (step 3). It then generates b and I D;, signs and sends the signed message 
(rmi) to vi (step 4 and 5). v; verifies the rm,’s signature (step 6) and sends back 
the signature of g||a||b||7D; (step 7). Finally, RSU verifies the signature s; (step 
8). If all the steps are completed correctly, the key agreement process succeeds. 


DL « Bap) 


1. Randomly generate g,a a 
2. Compute m; = Encry(g||a,PKrsy) —— ~> 3. Compute m = Decry (m;i, SKrsu) 
tmi 4. Randomly generate b, ID; 
6. Verify(rm;, PKgsu) <— 5. Compute rm; = m||b||IDi||Sign(m||b||IDi, SKrsu) 


7. Compute s; = Sign(g|lal|b||/D;, SK) —%i—-> 8. Verify(s;, PK;) 


Fig. 3. Symmetric key agreement in RAISE 
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2. Information transmission (IT) initiates after the SKA is completed. The 
traffic information (i.e, brake, direction, position and speed) of v; is integrated 
into a message msg; = brake;||direction,||x;||y;||speed;. As presented in Fig. 4, 
initially, v; generates the message authentication code (MAC) of msg; with the 
symmetric key k; (generated in SKA). Then, v; concatenates the MAC code with 


(CRD) _— 
© si i Sa“) 
1. Compute mac; = MAC(msgj, ki) om: 
Encode vm; = msg;||mac; i > 2. Check freshness 


> 3. Verif y(mac;, ki) 
4. Compute h; = Hash(msg;) 
5. Compute hm; = Encry(h;, PKrsy) hmi, 6. Compute h = Decry(hm;, SKprsu ) 
7. Compute hcode = Hash(msgi) 
8. Verify(hcode, h ) 


Fig. 4. Information transmission in RAISE 


msg; and sends it to RSU and v;+1 (step 1). Upon receiving vm;, vi+ı checks 
the freshness of the message (step 2), i.e., if the time interval between the current 
time and the time when vm; is sent is greater than the predefined threshold, 
vi41 drops vm;. At the same time, RSU checks the authenticity of vm; (step 
3). If mac; is correct, RSU computes the hash code h; of message msg; (step 
4). Afterwards, it encrypts h; and sends the encrypted result hm; to vi+ı (step 
5). uj41 decrypts hm; and get the hash code h (step 6). Furthermore, to ensure 
the consistency of the message, v;,1 itself also computes the hash code of msg; 
(step 7). It then verifies whether the hash code calculated by itself is the same 
as the decrypted hash code and decides to accept or reject msg; (step 8). 

To model RAISE in UPPAAL-SMC, interactions among vehicles and RSU (i.e., 
sending/receiving messages) are modeled by synchronization channels [31] and 
global variables. The cryptographic operations in RAISE refer to public and pri- 
vate key encryption and decryption, i.e., a message encrypted by public key can 
be decrypted using the corresponding private key, and vice versa. The automaton 
of cryptographic device [6] is adopted to model the encryption and decryption. 
Figure5 presents the STA capturing behaviors of vehicle v; and RSU in SKA. 
startEn (resp. startDe) and finDe (resp. finEn) are channels for indicating the 
starting and finishing of encryption (resp. decryption). The encryption/decryp- 
tion result is denoted en_res/de_res. In the STA, names of locations indicate the 
corresponding steps pictured in Fig. 3. 

IT phase from vo to vı is established with the help of RSU, modeled as the 
STA shown in Fig. 6 (the transmission from vı to v2 can be modeled similarly). 
The behaviors of vo (sender), vı (receiver) and RSU in the IT phase are modeled 
in IT_vO, IT_vi and IT_RSU STA, respectively. 

The SKA (or IT) succeeds if each step of the SKA (IT) is completed correctly 
within a given time interval, modeled by invariant “t <d” (the value of d varies 
in different steps). If timeout occurs (i.e., “t >q”), fail location will be activated 
and the procedure is restarted from the initial step. 
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Fig. 6. UPPAAL-SMC model of IT 


4.2 Modeling of Attacks in UPPAAL-SMC 


We present the modeling of three types of attacks commonly used in the secu- 
rity analysis, i.e., message falsification, message replaying and message spoofing 
attacks [2]. The models of attacks are illustrated in Fig. 7, where the ls parame- 
ter (ls € [0,100]) serves as an indicator of level of adversarial strength while gc 
(gc € [0, 100]) is an indicator of the adversarial channel quality. 


Message Falsification Attack (MFA) aims to falsify messages transmitted 
from v; to vj41, which is modeled as MFA STA in Fig. 7. As described earlier, in 
RAISE, RSU verifies the authenticity of messages by checking the correctness of 
the MAC code of messages. To deceive the RSU on the validity of the modified 
message and avoid exposing itself to RSU, MFA attempts to obtain the symmet- 
ric key and utilizes the key to compute the MAC code of the falsified message. At 
s1 state, MFA eavesdrops on rm; (generated at step 5 in Fig.3), which contains 
the information for symmetric key generation (i.e., g, a, b). It tries to decrypt rm; 
when receiving it via sendrm|[i]?. The probability that the decryption can suc- 
ceed is 1s%, modeled by probabilistic choices [31] (dashed edges) with probability 
weight as ts and +015, If the decryption succeeds, MFA obtains the symmet- 
ric key of v; based on the decrypted result (get K ey(de_res)). Finally, it modifies 
the content of message using the key, and tries to send the modified message to 
vi+ı (sendymii]!). The probability that the message can be sent successfully is 
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(100-gc)%. In our setting, MFA modifies the speed; field in the message into a 
random value in [100, 120], and changes the direction as direction; = 4, which 
indicates that the v; is running at the positive direction on y-axis. 


initial] [MSA encode(i) 


MFA sendrm{i]? Or t>=d MRAdelay 
s2 3 1 © 
=f 


eee ‘ A y(rm[i], PKrsu) 
= leakK[v_id]! 


Seine ge 
sendvnm[il? 
atack É r 


Fig. 7. STA of attacks 


Message Replaying Attack (MRA) targets to replay obsolete messages that 

contain old information. The MRA STA represents an MRA that replays messages 
sent by vi. Upon capturing a message (via sendum|i]?), MRA stores the message 
(m= vm/t]) and tries to replay it at a later time (i.e., after 10 s). The probability 
that the attacker can replay the message successfully is (100-qc)%. 


Message Spoofing Attack (MSA) impersonates a vehicle (v;) in order to 
inject fraudulent information into its subsequent vehicle (vj;1). Similar to MFA, 
MSA STA first obtains the symmetric key of v; by detecting and decrypting 
rmi. It then fabricates a new message whose content is “brake; = 0, speed; = 
0, direction; = 4, x; = 0, yi = 10” (denoted “encode(i)”) and tries to send 
the message to vi+1 Genemi i]!), with the probability of the message being sent 
successfully as (100-qc)%. 


5 Representation of S/S Related Timing Constraints 
in UPPAAL-SMC 


To enable the formal verification of S/S related timing constraints (given in 
Sect. 3), we first investigate how to specify those constraints in PrCcst. Then, 
translation from PrCcsL specifications of the constraints into verifiable STA is 
demonstrated. Furthermore, a tool ProTL that supports the automatic trans- 
formation based on the proposed translation rules is introduced. 


5.1 Specifications of S/S Related Timing Constraints in PrCCSL 


The specifications of R1-R11 are presented in Table 1, where ac is a clock that 
always ticks while nc represents a clock that never ticks. R1 is specified as an 
exclusion relation between qdir (the event that the vehicles are running at the 
positive direction of x-axis) and ovtake (the event that the position of follower vı 
on x-axis is greater than that of leader vo). Similarly, R7 and R9 can be specified 
as exclusion relations. 

In the specification of R2, stopD is a clock generated by delaying stopSign 
(the event that the leader vehicle detects a stop sign) for 2000 ms. vstop refers 
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Table 1. PrCCSL specifications of R1—R11 


Req PrCCSL Specification 


R1 adir£ dir =1? ac: nc, ovtake £ x1 > zo ? ac: nc, xdir # 0.95 ovtake 


R2 stopSign £ sign =5 ? signRec: nc, stopD £ stopSign (2000) ~ ms, 
vstop <o.95 stopD 

R3  vUnsafeDe = vUnsafe (200) ~ ms, vDec <o.95 vUnsafeDe 

R4 vFarDisDe £ vFarDis (300) ~ ms, startAcc <o.95 vFarDisDe 

R5 v0TurnDe £ v0Turn (3000) ~ ms, finTurn Xo.95 vOTurnDe 

R6 = msgRec Co.95 msgSent 

R7 leakK #0.95 ac 

R8 validMsg £ rMsg = sMsg ? msgRec : nc, msgRec =o.95 validM sg 

R9 oldMsg Ê time — ts > thre ? msgAcpt : nc, msgAcpt #0.95 oldM sg 

R10 startSKADe £ startSKA (600) ~ ms, finSKA <o.95 startSK ADe 

R11 fclk 4 msgSent Y01(1), sentDel = msgSent (100) ~ ms, 


sentDe2 £ msgSent (300) ~~ ms, sentDel <o.95 felk, 
felk Xo.95 sentDe2 


to the event that three vehicles are completely stopped, which should occur no 
later than stopD. Hence, R2 is expressed as a causality relation between vstop 
and stopD. R3-R5 can be specified in a similar manner. 

R6 (authenticity) is expressed as a subclock relation between msgRec and 
msgSent, where msgRec (msgSent) represents the event that a message is 
received (sent) by the follower (leader) vehicle. R8 is specified as a coincidence 
relation between msgRec and validMsg, where validMsg is a clock that ticks 
with msgRec when the received message rMsg is identical with the sent message 
sMsg (i.e., rMsg == sMsg). For R10, startSKA (finSKA) represents the starting 
(completion) of SKA. startSKADe is a clock constructed by delaying startSKA 
for 600 ms. R10 delimits that finSKA must occur before startSKADe. R11 states 
that two consecutive occurrences of msgSent must has a interval of [period — 
jitter, period + jitter|ms (i.e., [100, 300] ms). In the specification of R11, fclk is 
a clock generated by filtering out the 1% tick of msgSent. sentDe1 and sentDe2 
are two clocks generated by delaying msgSent for 100ms and 300ms. R11 can 
be interpreted as: Vi € Nt, the i” tick of felk should occur later than the i” 
tick of sentDe1 but prior to the it” tick of sentDe2. 


5.2 Translation of PrCCSL into STA 


We present how the S/S related timing constraints specified in PrCcsL can 
be transformed into STA and probabilistic queries in UPPAAL-SMC. We first 
describe how clock tick and history (introduced in Sect. 2) can be represented in 
UPPAAL-SMC. Using the mapping, we then demonstrate that expressions and 
relations in PrCcsL can be translated into STA and queries. 


Formal Verification of Safety & Security Related Timing Constraints 221 


In the earlier work [14], the semantics of PrCcsL operators are translated into 
STA based on discrete time, i.e., the continuous physical time is discretized into 
a set of equalized steps. As a result, two clock instants are still considered coinci- 
dent even if they are one time step apart. To alleviate this restriction and enable 
the representation of PrCcsL that pertains to continuous real-time semantics, 
the mapping patterns are refined: two clock instants are coinstantaneous only if 
the time difference between them is insignificant, i.e., the time difference between 
them is less than a positive infinitesimal value e, e.g., e = 0.000001. 

In PrCcst, a logical clock represents an event and the 
instants of the clock correspond to the occurrences of the 
event. A logical clock c is represented as a synchronization initial 
channel c! in UPPAAL-SMC. The history of c is modeled 
as the STA shown in Fig. 8: whenever c occurs (c?), the 
value of its history is increased by 1 (i.e., h++). 

Based on the mapping patterns of tick and history, the PrCCsL expressions 
(including ITE, DelayFor and filterBy), as well as relations (including subclock, 
coincidence, exclusion and precedence), can be represented as STA and queries 
shown in Fig. 9. 

The STA of expressions trigger the ticks of the new clock (denoted res!) 
based on the occurrences of existed clocks. To represent relations, observer STA 
that capture the semantics of standard subclock, coincidence, exclusion and 
precedence relations are constructed. Each observer STA contains a “fail” loca- 
tion (see Fig.9), which indicates the violation of the corresponding relation. 
Recall the definition of PrCcsL in Sect.2, the probability of a relation being 
satisfied is interpreted as a ratio of runs that satisfies the relation among all 
runs. It is specified as Hypothesis Testing queries in UPPAAL-SMC, Ho: 7 > p 
against Hı: 4 < p, where m is the number of runs satisfying the given relation 
out of all k runs. As a result, the probabilistic relations are interpreted as the 
query (see Fig.9): Pr[bound]({ ] STA. fail) > p, which means that the proba- 
bility of the “fail” location of the observer STA never being reached should be 
greater than or equal to p. The STA of expressions and relations are composed to 
the system NSTA in parallel. Then, the probabilistic analysis is performed over 
the composite NSTA that enables us to verify the S/S related timing constraints 
over the entire system using UPPAAL-SMC. 


Fig. 8. History 


Tool support: Manual translation of PrCcsL specifications into UPPAAL mod- 
els for verification can be time-consuming and error-prone. To improve the accu- 
racy and efficiency of translation, we implement a tool ProTL (Probabilistic- 
CcsL TransLator) [26] that provides a push-button transformation from PrCcsL 
specifications into corresponding STA & queries. Furthermore, verification and 
simulation support is provided in ProTL by employing the UPPAAL-SMC as the 
backend analysis engine. ProTL encompasses the following features: (1) An edi- 
tor for editing PrCcsL specification of requirements (stored as “tat” files); (2) 
Automated transformation of PrCcsL specifications into UPPAAL-SMC STA; 
(3) Integration of the STA and the system behavioral model (imported by users); 
(4) A configuration palette for setting parameters (e.g., time bound of simula- 
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| STA of PrCCSL Operators 


Remarks 


spawn DelayFor() ! 


3 


base? 
x<d 


detect, 


DelayFor: res * ref (d) ~> base 
When ref occurs (ref?), its DelayFor STA is spawned by Detect STA. The spawned STA stays 
in the detect location until base ticks d times. When base ticks d times (x == d), it transits to 


O the tick location and triggers res (res!). Then it becomes inactive (denoted “exit()”), i.e., 
Detect(ref) DelayFor(base,d,res)| Calculation of the current tick of res is completed. 
b==1 c1? ITE (if-then-else): res b ? cl : c2 
detect ticl 


res! 


Q b==0 c2? © 


ITE(c1, c2, b, res) 


ITE generates a new clock res that behaves either as c1 or as c2 base on the value of 
boolean variable b. If b is true (b == 1), the tick of res will be triggered (denoted res!) 
whenever c1 occurs (c1?). Otherwise, res ticks with c2 when b is false (b == 0). 


j++ ufiJ==0 


period 


FilterBy: res ê base Y u(v) 

FilterBy filters the instants of base based on a binary word w=u(v), i.e., Y k E N, if the kth 
bit in w is 1, then at the kth tick of base, res ticks. u and v are two boolean arrays. lu and Iv 
represent the size of u and v. As base ticks (base?), the STA firstly traverses the bits in u (at 
prefix state) and then iterates the bits in v (at period state). If the present bit (indicated by 
the index) of the binary word is 1, the STA triggers res (res!). Otherwise, it moves to the 


initial state, updates the index to refer to the next bit of w (i++/j++) and repeats the process. 


Probabilistic Coincidence: cl=,,c2 

When c1 (c2) ticks via c1? (c2?), the STA checks if the other clock, c2 (c1), ticks at the same 
time. If c2 (c1) occurs within a positive infinitesimal value (t<=e), the STA transits to success 
location. Otherwise, the coincidence relation is violated and STA transits to fail location. 
Probabilistic coincidence is expressed as: Pr[bound]([ ] - Coincidence.fail) > p. 


Probabilistic Subclock: cl C,,c2 

The relation limits that c2 (superclock) must tick when c1 (subclock) ticks, i.e., when c4 ticks, 
c2 must coincide with c1. When c1 (c2) occurs, the STA checks whether the other clock also 
ticks at the same time. When c1 (subclock) ticks but c2 does not occur (within e time unit), 
the relation is violated and the STA transits to fail location. Probabilistic subclock is 
expressed as: Pr[bound]|([ ] = Subclock.fail) > p. 


Probabilistic Exclusion: Cl7#,)c2 

When c1 (c2) ticks via c1? (c2?), the STA checks if the other clock, c2 (c1), ticks at the same 
tme, i.e., whether c1 (c2) occurs or not when t < e. If it occurs, the exclusion relation is 
violated and STA moves to fail location. Probabilistic exclusion is expressed as: Pr[bound]|([ ] 
~ Exclusion.fail) > p. 


success 


Probabilistic Precedence: cl <,,c2 

The relation states that c1 must run faster than c2, i.e., the history of c1 (h1) must be 
greater than or equal to the history of c2 (h2), and c2 must not tick when the histories of 
the two clocks are equal. Therefore, if c1 ticks via c1? and c1 runs slower (i.e., h1<h2), or 
c2 ticks via c2? when their histories are equal (h1==h2), the precedence relation is violated 
and fail location is activated. Probabilistic precedence is expressed as: Pr[bound]([ ] + 
Precedence.fail) > p. 


Precedence(c2, h1, h2) 


Fig. 9. STA of PrOCcsL operators 


tion, number of simulations) used for verification and simulation; (5) Automatic 
generation of probabilistic queries (introduced in Sect. 2) based on user-specified 
parameters; (6) Capability of performing verification and simulation on PrCcsL 
specifications against the integrated model and generated queries. 

The GUI of ProTL is implemented by applying the Python package TKIN- 
TER [27]. The implementation of Translator is achieved by the ANother Tool 
for Language Recognition (ANTLR) [24], a parser generator that can constructs 
lexical parsers for a language by analyzing user-defined syntax of the language. 
We specified the syntax of PrCcsL in Backus-Naur Form (BNF) and apply 
ANTLR to generate a parser that can analyze and recognize encodings in the 
format of PrCcsL. The parser reads the PrCcsL specifications and generates 
abstract syntax trees (AST), i.e., an intermediate form that has tree structures. 
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By traversing AST, the information (i.e., operators and parameters) of PrCcsL 
can be extracted and utilized for generation of corresponding STA. 


6 Experiment 


To identify vulnerabilities of system to external malicious attackers, we combine 
the refined CAS system model (including the models of RAISE protocol) with 
models of three different attackers. Formal verification on S/S related timing 
constraints (R1-R11) for the combined model is performed by UPPAAL-SMC. 
The combined CAS model contains the stochastic behaviors in terms of the 
unpredictable environments (e.g., the traffic signs are randomly recognized by the 
leader vehicle of CAS and the probability of each sign type occurring is equally 
set as 16.7%), as well as the indeterministic behaviors modeled by weighted 
probability choices in the STA of attacks (see Fig.7). In our setting, ls and qc 
are configured as 10 and 90, respectively. To estimate the probability of an attack 
being launched on CAS successfully, Probability Estimation query is applied to 
check the probability that the “attack” location in each attack STA is reachable 
from the system NSTA. The time bound of the verification is set as 10000. 
The probability of message falsification, message replaying and message spoofing 
attack being successfully completed by the corresponding attacker is within the 
range of [0.109, 0.209], [0.563, 0.663] and [0.143, 0.243], respectively. 

In our experiments, S/S related timing constraints are specified in PrCcsL 
and transformed into STA using ProTL. Each constraint is specified as a PrCcsL 
relation (as described in Sect. 5.1) whose probability threshold is 95%. The verifi- 
cation results are demonstrated in Table 2, in which “./” denotes the correspond- 
ing requirement is satisfied while “x” indicates the violation of the requirement: 
Under the message replaying attack, all the S/S timing constraints are estab- 
lished as valid with 95% level of confidence. In the message falsification attack, 
the secrecy and integrity properties (R7 and R8), as well as three safety proper- 
ties (R3-R5), are violated. The MSA damages the authenticity (R6) and secrecy 
(R7) of communication, and leads to the violations of four safety properties, i.e., 
R1 and R3-R5. 


Table 2. Verification results of timing constraints under different attacks 


Attacks R1/R2/R3|R4/R5|R6.R7 R8'R9/R10 R11|Average Time|Mem (Mb) 
Message Falsification| /|./|x|x|x|/ x x Viv v 40.20 57.94 
Message Replaying | VIVI VIVI VIV VVV v v 68.33 61.49 
Message Spoofing | x |/|x/x|x|x x Vivi VvV v 58.11 40.23 


The experiment results indicate the severity of impacts on safety and security 
caused by the demonstrated attacks on CAS: No requirement is violated under 
MRA scenario while the MSA causes the violations of most safety properties. 
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When CAS is attached with the STA of MSA or MFA, the secrecy of symmetric 
key is violated. With the obtained symmetric key, MSA can masquerade message 
as legitimate vehicles and MFA is able to tamper the content of messages without 
being detected, leading to the violations of authenticity (R6) and integrity (R7) 
respectively. To explore how the malicious attackers can influence the safety of 
system, we conduct simulation by using Simulations queries. The simulation 
results in Fig. 10 illustrate how an MSA drives the system to undesirable states. 
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Fig. 10. Simulation results of R1 and R4: (a) At Time = 2345, the attack occurs (indi- 
cated by the rising edge of the red line). MSA sends the fabricated position information 
of Vo to Vı (the value of recz becomes 0), which tricks Vi to think that the distance 
between Vo and Vi exceeds the maximum limit. Vı keeps increasing its speed (speed1) 
and thus leading to the collision (indicated by £o == xı) at Time =3815, which vio- 
lates R1. (b) When an attack takes place at Time = 2496 (indicated by the rising edge 
of the blue line), Vi receives the message from the attacker and is deluded into believing 
that the speed of Vo is 0. Therefore, Vi keeps decreasing its speed even if the distance 
between Vo and V; becomes greater than 100 m, which violates R4. (Color figure online) 


7 Related Work 


Formal verification of (non)-functional properties of automotive systems con- 
taining stochastic behaviors were investigated in several works [13-15]. In these 
works, systems are by default resilient to security threats and the safety prop- 
erties are analyzed under no malicious attack scenarios, which is inadequate for 
design of automotive systems interconnected via wireless communications. Com- 
bined analysis of safety and security (S/S) properties for interconnected cyber 
physical systems have been addressed in earlier works [1,21,29], which are how- 
ever, limited to theoretical frameworks and high-level descriptions of S/S prop- 
erties without the support for formal verification. Pedroza et al. [25] proposed 
a SysML based environment called AVATAR for the formal verification of S/S 
properties, which enables assessment of the impacts of cyber-security threats on 
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functional safety. Wardell et al. [32] proposed an approach for identifying secu- 
rity vulnerabilities of industrial control systems by modeling malicious attacks as 
PROMELA models amenable to formal verification. However, those approaches 
lack precise probabilistic annotations specifying stochastic properties regarding 
to S/S aspects. Kumar et al. [18] introduced the attack-fault trees formalism for 
descriptions of attack scenarios and conducted formal analysis by using UPPAAL- 
SMC to obtain quantitative estimation on impacts of system failures or security 
threats. On the other hand, our work is based on the probabilistic extension of 
S/S related timing constraints with the focus on probabilistic verification of the 
extended constraints. 


8 Conclusion 


This paper presents a model-based approach for probabilistic formal analysis of 
safety and security (S/S) related timing constraints for interconnected automo- 
tive system in EAST-ADL at the early design phase. The behavioral model of 
automotive system in UPPAAL-SMC is refined by adding the models of vehic- 
ular communication protocol and malicious attacks, which facilitates to exploit 
the impacts of adversary environment on S/S of the system. Timing constraints 
are specified in PrCcsL and translated into stochastic timed automata (STA) 
amenable to formal verification using UPPAAL-SMC. A set of translation rules 
from PrCcst to STA, as well as the corresponding tool support for automating 
the translation are provided. We demonstrate our approach by performing formal 
verification on a cooperative automotive system (CAS) case study. Although, we 
have shown the one-to-one mapping patterns from a subset of PrCCsL operators 
to STA for conducting formal verification on timing constraints using UPPAAL- 
SMC, as ongoing work, systematic and formal translation techniques covering a 
full set of PrCcsL constraints are further studied. Furthermore, new features 
of ProTL with respect to analysis of UPPAAL-SMC models involving wider 
range of variable/query types (e.g., urgent channels, bounded integers) are further 
developed. 
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Abstract. Verifying whether a procedure is observationally pure (that 
is, it always returns the same result for the same input argument) is 
challenging when the procedure uses mutable (private) global variables, 
e.g., for memoization, and when the procedure is recursive. 

We present a deductive verification approach for this problem. Our 
approach encodes the procedure’s code as a logical formula, with recur- 
sive calls being modeled using a mathematical function symbol assum- 
ing that the procedure is observationally pure. Then, a theorem prover is 
invoked to check whether this logical formula agrees with the function 
symbol referred to above in terms of input-output behavior for all argu- 
ments. We prove the soundness of this approach. 

We then present a conservative approximation of the first approach 
that reduces the verification problem to one of checking whether a 
quantifier-free formula is satisfiable and prove the soundness of the sec- 
ond approach. 

We evaluate our approach on a set of realistic examples, using the 
Boogie intermediate language and theorem prover. Our evaluation shows 
that the invariants are easy to construct manually, and that our approach 
is effective at verifying observationally pure procedures. 


1 Introduction 


A procedure in an imperative programming language is said to be observationally 
pure (OP) if for each specific argument value it has a specific return value, across 
all possible sequences of calls to the procedure, irrespective of what other code 
runs between these calls. In other words, the input-output behavior of an OP 


procedure mimics a mathematical function. 


A deterministic procedure that does not read any pre-existing state other 
than its arguments is trivially OP. However, it is common for procedures to 
update and read global variables, typically for performance optimization, while 
still being OP. In this paper, we focus on the problem of checking observational 
purity of procedures that read and write global variables, especially in the pres- 


ence of recursion, which makes the problem harder. 
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1 

2 int g := —-1; 

3 int lastN := 0; 

4 int factCache( int n) { 

5 if(m <= 1) { 

6 result := 1; 

7 } else if (g != -1 && n == lastN) { 
8 result := g; 

9 } else { 

10 g =n * factCache( n — 1 ); 
11 lastN =n; 

12 result := g; 

13 } 

14 return result; 

15 } 


Listing 1.1. Procedure factCache: returns n!, and memoizes most recent result. 


Motivating Example. We use procedure ‘factCache’ in Listing 1.1 as our run- 
ning example. It returns n! for a given argument n, and caches the return value of 
the most recent call. It uses two private global variables, g and lastN, to imple- 
ment the caching. g is initialized to —1. After the first call to the procedure 
onwards, g stores the return value of the most recent call, and lastN stores the 
argument of the most recent call. Clearly this procedure is OP, and mimics the 
input-output behavior of a factorial procedure that does not cache any results. 


Proposed Approach. Our approach is based on Floyd-Hoare logic, which typ- 
ically requires a specification of the procedure to be provided. One candidate 
specification would be a full functional specification of the procedure. If the user 
specifies that factCache realizes n!, then the verifier could replace Line 10 in 
the code with ‘g = n * (n—1)!’. This, on paper, is sufficient to assert that 
Line 12 always assigns n! to result. However, to establish that Line 8 also does 
the same, an invariant would need to be provided that describes the possible 
values of g before an invocation to the procedure. In our example, a suitable 
invariant would be ‘(g = —1) V (g = lastN!)’. The verifier would also need to 
verify that at the procedure’s exit the invariant is re-established. Lines 10-12, 
with the recursive call replaced by (n — 1)!, suffices on paper to re-establish the 
invariant. 

The candidate approach described above, while plausible, suffers from two 
weaknesses. First, a mathematical specification of the function being computed 
may be complex and non-trivial to write. (Note, for example, that factCache is 
defined for negative integers while factorial is not. Thus, the previous candidate 
specification is actually incorrect for this edge case.) Second, the underlying 
theorem prover would need to prove complex arithmetic properties, e.g., that n 
* (n— 1)! is equal to n!. Complex proofs such as this may be beyond the scope 
of many existing theorem provers. 

Our key insight is to sidestep the challenges mentioned by introducing a 
function symbol, say factCache, and replacing the recursive call for the purposes 
of verification with this function symbol. (Note that we reuse the same sym- 
bol for two purposes, which may be slightly confusing here. One denotes the 
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procedure name, while the other denotes a function symbol for use in a logi- 
cal formula. The italicized name here denotes the function symbol.) Intuitively, 
factCache represents the mathematical function that the given procedure mim- 
ics if the procedure is OP. In our example, Line 10 would become ‘g = n * 
factCache(n — 1)’. This step needs no human involvement. The approach needs 
an invariant; however, in a novel manner, we allow the invariant also to refer 
to factCache. In our example, a suitable invariant would be ‘(g = —1) V (g = 
lastN * factCache(lastN —1))’. This sort of invariant is relatively easy to con- 
struct; e.g., a human could arrive at it just by looking at Line 2 and with a 
local reasoning on Lines 10 and 11. Given this invariant, (a) a theorem prover 
could infer that the condition in Line 7 implies that Line 8 necessarily copies 
the value of ‘n * factCache(n— 1)’ into ‘result’. Due to the transformation to 
Line 10 mentioned above, (b) the theorem prover can infer that Line 12 also 
does the same. Note that since these two expressions are syntactically identical, 
a theorem prover can easily establish that they are equal in value. Finally, since 
Line 6 is reached under a different condition than Lines 8 and 12, the verifier 
has finished establishing that the procedure always returns the same expression 
in n for any given value of n. 

Similarly, using the modified Line 10 mentioned above and from Line 11, the 
prover can re-establish that g is equal to ‘lastN * factCache(lastN — 1)’ when 
control reaches Line 12. Hence, the necessary step of proving the given invariant 
to be a valid invariant is also complete. 

Note, the effectiveness of the approach depends on the nature of the given 
invariant. For instance, if the given invariant was ‘(g = —1) V (g = lastN!)’, which 
is also technically correct, then the theorem prover may not be able to establish 
that in Lines 8 and 12 the variable ‘g’ always stores the same expression in n. 
However, it is our claim that in fact it is the invariant ‘(g =—1) V (g = lastN * 
factCache(lastN —1))’ that is easier to infer by a human or by a potential tool, 
as justified by us two paragraphs above. 


Salient Aspects of Our Approach. This paper makes two significant con- 
tributions. First, it tackles the circularity problem that arises due to the use 
of a presumed-to-be OP procedure in assertions and invariants and the use of 
these invariants in proving the procedure to be OP. This requires us to prove 
the soundness of an approach that simultaneously verifies observational purity 
as well the validity of invariants (as they cannot be decoupled). 

Secondly, we show that a direct approach to this verification problem (which 
we call the existential approach) reduces it to a problem of verifying that a 
logical formula is a tautology. The structure of the generated formula, however, 
makes the resulting theorem prover instances hard. We show how a conservative 
approximation can be used to convert this hard problem into an easier problem 
of checking satisfiability of a quantifier-free formula, which is something within 
the scope of state-of-the-art theorem provers. 

The most closely related previous approaches are by Barnett et al. [1,2], and 
by Naumann [3]. These approaches check observational purity of procedures that 
maintain mutable global state. However, none of these approaches use a function 
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L E€Lib ::=—f7=¢P 

P € Proc ::= p (x) { S; return y } 

S € Stmt ::= x := e | x := p(y) IS; S | if (e) then S else S 
e € Expr ::=c |x | e ope | unop e 

op E€ 0ps ::=+|-I/l*lžl>l<l=|A^Alv 


unop € UnOps :: 
x, y € LocalId U GloballId, g € GlobalId, c € V, p E€ ProclId 


7a 


Fig. 1. Programming language syntax and meta-variables 


symbol in place of recursive calls or within invariants. Therefore, it is not clear 
that these approaches can verify recursive procedures. Barnett et al., in fact, 
state “there is a circularity - it would take a delicate argument, and additional 
conditions, to avoid unsoundness in this case”. To the best of our knowledge 
ours is the first paper to show that it is feasible to check observational purity 
of procedures that maintain mutable global state for optimization purposes and 
that make use of recursion. 

Being able to verify that a procedure is OP has many potential applications. 
The most obvious one is that OP procedures can be memoized. That is, input- 
output pairs can be recorded in a table, and calls to the procedure can be 
elided whenever an argument is seen more than once. This would not change the 
semantics of the overall program that calls the procedure, because the procedure 
always returns the same value for the same argument (and mutates only private 
global variables). Another application is that if a loop contains a call to an OP 
procedure, then the loop can be parallelized (provided the procedure is modified 
to access and update its private global variables in a single atomic operation). 

The rest of this paper is structured as follows. Section 2 introduces the core 
programming language that we address. Section3 provides formal semantics 
for our language, as well as definitions of invariants and observational purity. 
Section 4 describes our approach formally. Section 5 discusses an approach for 
generating an invariant automatically in certain cases. Section 6 describes eval- 
uation of our approach on a few realistic examples. Section 7 describes related 
work. More details about the proofs and the examples can be found in [4]. 


2 Language Syntax 


In this paper, we assume that the input to the purity checker is a library con- 
sisting of one or more procedures, with shared state consisting of one or more 
variables that are private to the library. We refer to these variables as “global” 
variables to indicate that they retain their values across multiple invocations of 
the library procedures, but they cannot be accessed or modified by procedures 
outside the library (that is, the clients of the library). 

In Fig. 1, we present the syntax of a simple programming language that we 
address in this paper. Given the foundational focus of this work, we keep the 
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programming language very simple, but the ideas we present can be generalized. 
A return statement is required in each procedure, and is permitted only as 
the last statement of the procedure. The language does not contain any looping 
construct. Loops can be modelled as recursive procedures. The formal parameters 
of a procedure are readonly and cannot be modified within the procedure. We 
omit types from the language. We permit only variables of primitive types. In 
particular, the language does not allow pointers or dynamic memory allocation. 
Note that expressions are pure (that is, they have no side effects) in this language, 
and a procedure call is not allowed in an expression. Each procedure call is 
modelled as a separate statement. 

For simplicity of presentation, without loss of conceptual generality, we 
assume that the library consists of a single (possibly recursive) procedure, with 
a single formal parameter. In the sequel, we will use the symbol P (as a metavari- 
able) to represent this library procedure, p (as a metavariable) to represent the 
name of this procedure, and will assume that the name of the formal parameter 
is n. If the procedure is of the form “p (n) { S; return r }”, we refer tor 
as the return variable, and refer to “S; return r” as the procedure body and 
denote it as body(P). The library also contains, outside of the procedure’s code, 
a sequence of initializing declarations of the global variables used in the proce- 
dure, of the form “g1 := cl;...; gN := cN”. These initializations are assumed 
to be performed once during any execution of the client application, just before 
the first call to the procedure P is placed by the client application. 

Throughout this paper we use the word ‘procedure’ to refer to the library 
procedure P, and use the word ‘function’ to refer to a mathematical function. 


3 A Semantic Definition of Purity 


In this section, we formalize the input-output semantics of the procedure P as a 
relation ~p, where n ~p r indicates that an invocation of P with input n may 
return a result of r. The procedure is defined to be observationally pure if the 
relation ~p is a (partial) function: that is, if n ~p rı and n ~p r2, then rı = r2. 

The object of our analysis is a single-procedure library, not the entire 
(client) application. (Our approach can be generalized to handle multi-procedure 
libraries.) The result of our analysis is valid for any client program that uses the 
procedure/library. The only assumptions we make are: (a) The shared state 
used by the library (the global variables) are private to the library and cannot 
be modified by the rest of the program, and (b) The client invokes the library 
procedures sequentially: no concurrent or overlapping invocations of the library 
procedures by a concurrent client are permitted. 

The following semantic formalism is motivated by the above observations. It 
can be seen as the semantics of the so-called “most general sequential client” 
of procedure P, which is the program: while (*) x = p (random());. The 
executions (of P) produced by this program include all possible executions (of 
P) produced by all sequential clients. 

Let G denote the set of global variables. Let L denote the set of local variables. 
Let V denote the set of numeric values (that the variables can take). An element 
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xEL  (pe®pg,e) fv 


Saas eae (CEPTS TW ew (CU (ETT 


xEG (pe © pg,e) $ v 
((x := e; S, pe), pg) >r ((S, pe); palz + v]) 


[ASSIGN-GLOBAL] 


[SEQ] (((S1; S2); S3, pe)Y, Pg) >r ((S1; (S2; S3), pe)Y, Po) 


(pe © pg, €) |} true 


IF-TRUE 
| l ((G£f (e) then Sjelse S2); S3, pe), Pg) >r ((S1; S3, pe)Y, Po) 
[IF-FALSE] a o eae 
((G£ (e) then Sı else S2); S3, pe)Y, Pg) >r ((S2; S3, pe), Pa) 
aai (pe pge) 4V P=pM) Sı 


((y := ple); S2, pe)Y, Pa) >r ((S1, [n > v])(y := ple); S2, pe), Pg) 


(pe 8 pg, r) $ v 
((return r, pe)(y := ple); S, pe)Y, Pg) >r (S, pely +> v])7, po) 


[RETURN] 


B = body(P) vEV 
(I, p9) >» ([B, [n => v))], pa) 


[TOP-LEVEL-CALL] 


[TOP-LEVEL-RETURN] ([(return r, pe)], pa) >r (l, pa) 


Fig. 2. A small-step operational semantics for our language, represented as a relation 
01 —p 02. A state o; is a configuration of the form ((S, pe)y, pg) where S captures 
statements to be executed in current procedure, pe assigns values to local variables, ~y 
is the call-stack (excluding current procedure), and pg assigns values to global variables. 


Pg E Xa = G — V maps global variables to their values. An element pg € 
Xr = L — V maps local variables to their values. We define a local continuation 
to be a statement sequence ending with a return statement. We use a local 
continuation to represent the part of the procedure body that still remains to 
be executed. Let Xç represent the set of local continuations. The set of runtime 
states (or simply, states) is defined to be (Xo x XL)* x Xa, where the first 
component represents a runtime stack, and the second component the values of 
global variables. We denote individual states using symbols o,01,0;, etc. The 
runtime stack is a sequence, each element of which is a pair (S, pe) consisting 
of the remaining procedure fragment S to be executed and the values of local 
variables pz. We write (S, pẹ)y to indicate a stack where the topmost entry is 
(S, pe) and y represents the remaining part of the stack. 

We say that a state ((S, pe)Y, pg) is an entry-state if its location is at the 
procedure entry point (i.e., if S is the entire body of the procedure), and we 
say that it is an ezit-state if its location is at the procedure exit point (i.e., if S 
consists of just a return statement). 
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A procedure P determines a single-step execution relation —>p, where 01 —>p 
a2 indicates that execution proceeds from state g1 to state o2 in a single step. 
Figure 2 defines this semantics. The semantics of evaluation of a side-effect-free 
expression is captured by a relation (p,e) |) v, indicating that the expression e 
evaluates to value v in an environment p (by environment, we mean an element of 
(GU L) => V). We omit the definition of this relation, which is straightforward. 
We use the notation pı W p2 to denote the union of two disjoint maps pı and p2. 
Note that most rules captures the usual semantics of the language constructs. 
The last two rules, however, capture the semantics of the most-general sequential 
client explained previously: when the call stack is empty, a new invocation of 
the procedure may be initiated (with an arbitrary parameter value). 

Note that all the following definitions are parametric over a given procedure 
P. E.g., we will use the word “execution” as shorthand for “execution of P”. 

We define an execution (of P) to be a sequence of states 0901 -+ -On such that 
Ci —p Oi41 for all 0 < i < n. Let Cinit denote the initial state of the library; 
i.e., this is the element of Xg that is induced by the sequence of initializing 
declarations of the library, namely, “gl := c1;...; gN := cN”. We say that an 
execution 0901 -:: On is a feasible execution if 09 = Oinit. Note, intuitively, a fea- 
sible execution corresponds to the sequence of states visited within the library 
across all invocations of the library procedure over the course of a single exe- 
cution of the most-general client mentioned above; also, since the most-general 
client supplies a random parameter value to each invocation of P, in general 
multiple feasible executions of the library may exist. 

We define a trace (of P) to be a substring 7 = 09 -- -on of a feasible execution 
such that: (a) oo is entry-state (b) on» is an exit-state, and (c) an corresponds 
to the return from the invocation represented by go. In other words, a trace is a 
state sequence corresponding to a single invocation of the procedure. A trace may 
contain within it nested sub-traces due to recursive calls, which are themselves 
traces. Given a trace 7 = 09::-On, we define initial(7) to be oo, final(7) to be 
On, input(m) to be value of the input parameter in initial(7), and output(m) to 
be the value of the return variable in final(z). 

We define the relation ~p to be {(input(z), output(m)) | m is a trace of P}. 


Definition 1 (Observational Purity). A procedure P is said to be observa- 
tionally pure if the relation ~>p is a (partial) function: that is, if for all n, rı, 
To, if n ~p rı and n ~p ro, then ry = ro. 


Logical Formula and Invariants. Our methodology makes use of logical for- 
mulae for different purposes, including to express a given invariant. Our logical 
formulae use the local and global variables in the library procedure as free vari- 
ables, use the same operators as allowed in our language, and make use of uni- 
versal as well as existential quantification. Given a formula y, we write p = ọ to 
denote that y evaluates to true when its free variables are assigned values from 
the environment p. 
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As discussed in Sect.1, one of our central ideas is to allow the names of the 
library procedures to be referred to in the invariant; e.g., our running example 
becomes amenable to our analysis using an invariant such as ‘(g=—1) V (g = 
lastN * factCache(lastN — 1))’. We therefore allow the use of library procedure 
names (in our simplified presentation, the name p) as free variables in logical 
formulae. Correspondingly, we let each environment p map each procedure name 
to a mathematical function in addition to mapping variables to numeric values, 
and extend the semantics of p = y by substituting the values of both variables 
and procedure names in y from the environment p. 

Given an environment p, a procedure name p, and a mathematical function 
f, we will write p[p > f] to indicate the updated environment that maps p to 
the value f and maps every other variable x to its original value p[z]. We will 
write (p, f) | y to denote that pfp > f] =E ¢. 

Given a state o = ((S, pe)y, pg), we define env(o) to be pe W pg, and given a 
state o = ([], pg), we define env(c) to be just pg. We write (0, f) H} y to denote 
that (env(c), f) H| p. For any execution or trace 7, we write (7, f) H ¢ if for 
every entry-state and exit-state o in 7, (o, f) H| p. We now introduce another 
definition of observational purity. 


Definition 2 (Observational Purity wrt an Invariant). Given an invari- 
ant y”, a library procedure P is said to satisfy pure(y'””) if there exists a 
function f such that for every trace n of P, output(r) = f(input(m)) and 
(T, f) = gee, 


It is easy to see that if procedure P satisfies pure(y’”’) wrt any given candidate 
invariant y’””’, then P is observationally pure as per Definition 1. 


4 Checking Purity Using a Theorem Prover 


In this section we provide two different approaches that, given a procedure P 
and a candidate invariant y’””, use a theorem prover to check conservatively 
whether procedure P satisfies pure(y’””). 


4.1 Verification Condition Generation 


We first describe an adaptation of standard verification-condition generation 
techniques (e.g., see [5]) that we use as a common first step in both our 
approaches. Given a procedure P, a candidate invariant y’””’, our goal is to 
compute a pair (y?°?, p°) where y??* is a postcondition describing the state 
that exists after an execution of body(P) starting from a state that satisfies 
y', and y’° is a verification-condition that must hold true for the execution 
to satisfy its invariants and assertions. 

We first transform the procedure body as below to create an internal repre- 
sentation that is input to the postcondition and verification condition generator. 
In the internal representation, we allow the following extra forms of statements 
(with their usual meaning): havoc(x), assume e, and assert e. 
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1. For any assignment statement “x := e” where e contains x, we introduce a 
new temporary variable t and replace the assignment statement with “t := 
e; x := t”. 


2. For every procedure invocation “x := p(y)”, we first ensure that y is a local 
variable (by introducing a temporary if needed). We then replace the state- 
ment by the code fragment “assert p"; havoc(g1); ... havoc(gN); 


assume Y? A^ x = p(y)”, where g1 to gN are the global variables. 
Note that the procedure call has been eliminated, and replaced with an 
“assume” expression that refers to the function symbol p. In other words, 
there are no procedure calls in the transformed procedure. 

3. We replace the “return x” statement by “assert y’”””. Note that we inten- 
tionally do not assert that the return value equals p(n). 


Let TB(P, y’””) denote the transformed body of procedure P obtained as above. 


POST(p?"®,x := e) = (dx.y?"®) A (x = e) (if x ¢ vars(e)) 
POST(p?"*, havoc (x)) = adgy 

POST(yp?"*, assume e) =° Ne 

POST(y?"*, assert e) pere 

POST(y?"®, S1; S2) = POST(POST(y?"*, S1), S2) 


POST(p?"®, if e then Sı else S2) = PosT(y?”® A^ e, S1) V POST(y?”® A ~e, S2) 


vo(y?"*, assert e) = (pP? > e) 

vo(y?"*, S1; S2) = vo(y?"®, S1) A vc(POST(y?"®, S1), S2) 
vo(y?"*, if e then Sı else S2) = vc(y?™® A e, S1) A vc(y?"® A~e, S2) 
vo(y?"*, s) = true(for all other S) 


POSTVC(P, yin’) = (Post( g TB(P, y'””)), velp", TB(P, yn”)) A (inrT(P) an pi”) 
Fig. 3. Generation of verification-condition and postcondition. 


We then compute postconditions as formally described in Fig. 3. This lets us 
compute for each program point £ in the procedure, a condition pẹ that describes 
what we expect to hold true when execution reaches £ if we start executing the 
procedure in a state satisfying y"? and if every recursive invocation of the 
procedure also terminates in a state satisfying y’’’. We compute this using the 
standard rules for the postcondition of a statement. For an assignment statement 
“x := e”, we use existential quantification over x to represent the value of x prior 
to the execution of the statement. If we rename these existentially quantified 
variables with unique new names, we can lift all the existential quantifiers to 
the outermost level. When transformed thus, the condition pẹ takes the form 
Jx1 -+ - Ln., where y is quantifier-free and x1,--- , £n denote intermediate values 
of variables along the execution path from procedure-entry to program point £. 

We compute a verification condition y’° that represents the conditions we 
must check to ensure that an execution through the procedure satisfies its obli- 
gations: namely, that the invariant holds true at every call-site and at procedure- 
exit. Let @ denote a call-site or the procedure-exit. We need to check that 
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1 g := —l; 

2 lastN := 0; 

3 factCache (n) { 

4 if (n <= 1) { 

5 result := A 

6 } else if (g != -1 && n == lastN) { 
T result := g; 

8 } else { 

9 ti := n-1; 

10 // t2 := factCache(t1); 

11 assert y””; 

12 havoc (g); havoc (lastN); 

13 assume y’""A (t2 = factCache(t1)); 
14 g := n + t2; 

15 lastN := n; 

16 result := g; 

17 } 

18 // return result; 

19 assert y’””; 


Listing 1.2. Procedure factCache from Listing1.1 transformed to incorporate a 
supplied candidate invariant y*””. 


ye > vy’ holds. Thus, the generated verification condition essentially consists 
of the conjunction of this check over all call-sites and procedure-exit. 

Finally, the function POSTVC computes the postcondition and verification 
condition for the entire procedure as shown in Fig. 3. (Thus, it returns a pair of 
formulae.) Note that this function also adds the check that the initial state must 
satisfy y’”” to the verification condition (as the basis condition for induction). 
INIT(P) is basically the formula “gil = c1 A... gN = cN” (see Sect. 2). 


Example. We now illustrate the postcondition and verification condition gener- 
ated from our factorial example presented in Listing1.1. Listing 1.2 shows the 
example expressed in our language and transformed as described earlier (using 
function TB), using a supplied candidate invariant y’””. 


Figure 4 illustrates the computation of postcondition and verification condi- 
tion from this transformed example. In this figure, we use y??° to denote the 
precondition computed to hold just before the recursive callsite, and y?°** to 
denote the postcondition computed to hold just after the recursive callsite. The 
postcondition y?°s' (at the end of the procedure body) is itself a disjunction of 
three path-conditions representing execution through the three different paths 
in the program. In this illustration, we have simplified the logical conditions 
by omitting useless existential quantifications (that is, any quantification of the 
form Jz.y where x does not occur in Y). Note that the existentially quantified 
g and lastN in yes! denote the values of these globals before the recursive call. 
Similarly, the existentially quantified g and lastN in ggah denote the values of 
these globals when the recursive call terminates, while the free variables g and 
lastN denote the final values of these globals. 
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INIT(P) = (g = -1) A (lastN = 0) 


hath — yim” A(n <= 1) A (result = 1) 


ph = vi An(n <= 1) A(g != 1) A(n = lastN) A (result = g) 


pre 


geet =p” A-(n <= 1) A7((g != 1) A (mn = lastN)) A (t1 = n-1) 


post pre 


pres’ = (AgdlastN bt) Ay” A (t2 = factCache (t1)) 


ph? = (AgdlastN p2?) A (g = n * t2)A (last N = n)A(result = g) 
prot — wn V ne V al 


pre 


p” = (ph > y™ )A(p 


post en?) 


=> y'”") A (INIT(P) > p 


Fig. 4. The different formulae computed from the procedure in Listing 1.2 by our post- 
condition and verification-condition computation. 


4.2 Approach 1: Existential Approach 


Let P be a procedure with input parameter n and return variable r. Let 
POSTVC(P, y'™”) = (pP°5t, °). Let Y? denote the formula p° A (y?%t = (r = 
p(n))). Let z denote the sequence of all free variables in Y° except for p. We 
define EA(P, y’””) to be the formula Vz.*. 

In this approach, we use a theorem prover to check whether EA(P, y 
is satisfiable. As shown by the following theorem, satisfiability of EA(P, y’"”) 
establishes that P satisfies pure(y’””). 


ue) 


Theorem 1. A procedure P satisfies pure(y’™’) if Ip.EA(P, p?) is a tautology 
(which holds iff BA(P, y'"”) is satisfiable). 


Proof. Note that p is the only free variable in EA(P, y’””). Assume that [p+ f] 
is a satisfying assignment for V%.w°. We show that for every feasible execu- 
tion 7: (P1) (m, f) F w’™’, and (P2) for every trace 7’ inside 7, output(n’) = 
f(input(x’)). This implies that P satisfies pure(y’””). 

In particular, for any feasible execution 7, we prove by induction over the 
execution steps in 7 that 


1. For any entry state o in 7, (o, f) Fy. 
2. For any exit state o in 7, (o, f) Fy”. 
3. For any exit state o in 7, if it is the exit state of a trace a’, then output(z’) = 


f(input(n’)). 


If the above properties fail to hold, we can identify a trace 7’ corresponding 
to the first such failure. It can be shown that the sequence of states visited by 
this trace, when substituted for 7, are a witness that [p > f] is not a satisfying 
assignment for Vz.w*°. This is a contradiction of our original assumption. 

Please see [4] for more details of the proof. 
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4.3 Approach 2: Impurity Witness Approach 


The existential approach presented in the previous section has a drawback. 
Checking satisfiability of EA(P, y’””) is hard because it contains universal quan- 
tifiers and existing theorem provers do not work well enough for this approach. 
We now present an approximation of the existential approach that is easier to use 
with existing theorem provers. This new approach, which we will refer to as the 
impurity witness approach, reduces the problem to that of checking whether a 
quantifier-free formula is unsatisfiable, which is better suited to the capabilities 
of state-of-the-art theorem provers. This approach focuses on finding a coun- 
terexample to show that the procedure is impure or it violates the candidate 
invariant. 

Let P be a procedure with input parameter n and return variable r. Let 
POSTVC(P, p?) = (yP?*, p°). Let y??s* denote the formula obtained by replac- 
ing every free variable x other than p in y?°** by a new free variable za. Define 
Ca similarly. Define rw(P, y’””) to be the formula (=y’*) V (pes A gy A 
(na = ng) A (Ta # rg))- l 

The impurity witness approach checks whether 1w (P, y’””) is satisfiable. This 
can be done by separately checking whether ~ọ”® is satisfiable and whether 
(prost A ae A (na = ng) A (Ta # rg)) is satisfiable. As formally defined, p° 
and ys! contain embedded existential quantifications. As explained earlier, 
these existential quantifiers can be moved to the outside after variable renaming 
and can be omitted for a satisfiability check. (A formula of the form 47%. is 
satisfiable iff w is satisfiable.) As usual, these existential quantifiers refer to 
intermediate values of variables along an execution path. Finding a satisfying 
assignment to these variables essentially identifies a possible execution path (that 
satisfies some other property). 


Theorem 2. A procedure P satisfies pure(y’””) if tw(P,y'"”) is unsatisfiable. 


Proof. We say that two traces disagree if they receive the same argument value 
but return different values. We say that a pair of feasible executions (7, 72) is 
an impurity witness if there is a trace Ta in 7, and a trace Ta in 72 such that Ta 
and m, disagree. 

A trace is said to be compatible with a function f (and vice versa) if the 
trace’s input-output behavior matches that of the function. An execution is said 
to be compatible with a function (and vice versa) if every trace in the execution 
is compatible with the function. We say that a feasible execution m strongly 
satisfies y'"” if for every function f that is compatible with 7, (r, f) = y’™. 

We prove the theorem using the following lemmas: if 1w(P, y’"”) is unsatis- 
fiable, then Lemmas 2 and 3 imply that the preconditions of Lemma 1 hold and, 
hence, P satisfies pure(y’””). 


1. If there exists no impurity witness, and every feasible execution strongly sat- 
isfies y'””, then P satisfies pure(y’””). 

2. If a feasible execution 7 that does not strongly satisfy y’”” exists, Iw(P, y'””) 
is satisfiable. 
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3. If an impurity witness exists, then Iw(P, y'””) is satisfiable. 


1 is straightforward. 

For 2, we use a “minimal” feasible execution 7 that does not strongly satisfy 
y’”” to construct a satisfying assignment to 7y”°. 

For 3, we use a “minimal” impurity witness to construct a satisfying assign- 
ment to (pees! A yh A (na = ng) A (ra = te) \. 

Please see [4] for more details of the proof. 


5 Generating the Invariant 


We now describe a simple but reasonably effective semi-algorithm for generating 
a candidate invariant automatically from the given procedure. Our approach of 
Sect. 4 can be used with a manually provided invariant or the candidate invariant 
generated by this semi-algorithm (whenever it terminates). 

The invariant-generation approach is iterative and computes a sequence of 
progressively weaker candidate invariants Jp, /;,--- and terminates if and when 
Im = Im+1, at which point Im is returned as the candidate invariant. The 
initial candidate invariant Jo captures the initial values of the global variable. 
In iteration k, we apply a procedure similar to the one described in Sect. 4 and 
compute the strongest conditions that hold true at every program point if the 
execution of the procedure starts in a state satisfying [;,,_, and if every recursive 
invocation terminates in a state satisfying Iķ—1. We then take the disjunction 
of the conditions computed at the points before the recursive call-sites and at 
the end of the procedure, and existentially quantify all local variables. We refer 
to the resulting formula as NEXT(J,_1, TB(P, Ik—-1)). We take the disjunction of 
this formula with J,_; and simplify it to get Ip. 

Figure5 formalizes this semi-algorithm. Here, we exploit the fact that the 
assert statements are added precisely at every recursive callsite and end of 
procedure and these are the places where we take the conditions to be disjuncted. 

In our running example, Jp is ‘g = —1 AlastN = 0’. Applying NEXT to Io 
yields I itself as the pre-condition at the point just before the recursive call-site, 
and ‘(g = —1AlastN = 0) V g = lastN * p(lastN — 1)’ (after certain simplifi- 
cations) as the pre-condition at the end of the procedure. Therefore, I; is ‘(g 
= —1AlastN = 0) V g = lastN * p(lastN —1)’. When we apply NEXT to h, 


Io = INIT(P) 
Ip = SIMPLIFY(Ip—1 V NEXT(Ik-1, TB(P, Ik—1))) 


NEXT(y?”®, assert e) = J4 - - - lmp”! (where ¢1,--- , £m are local variables in y?"°) 
NEXT(y?"®, S1; S2) = NExT(y?"®, S1) V NEXT(POST(y?’"®, S1), S2) 

NEXT(y?"®, if e then Sı else S2) = NEXT(p?"® A e, S1) V NEXT(y?’® A ~e, S2) 
NEXxT(y”’’, S) = false(for all other S) 


Fig. 5. Iterative computation of invariant. 
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the computed pre-conditions are J; itself at both the program points mentioned 
above. Therefore, the approach terminates with J, as the candidate invariant. 


6 Evaluation 


We have implemented our OP checking approach as a prototype using the Boogie 
framework [6], and have evaluated the approach using this implementation on 
several examples. The objective of this evaluation was primarily a sanity check, 
to test how our approach does on a set of OP as well as non-OP procedures. 

We tried several simple non-OP programs, and our implementation termi- 
nated with a “no” answer on all of them. We also tried the approach on several 
OP procedures: (1) the ‘factCache’ running example, (2) a version of a factorial 
procedure that caches all arguments seen so far and their corresponding return 
values in an array, (3) a version of factorial that caches only the return value for 
argument value 19 in a scalar variable, (4) a recursive procedure that returns 
the n Fibonacci number and caches all its arguments and corresponding return 
values seen so far in an array, and (5) a “matrix chain multiplication” (MCM) 
procedure. The last example is based on dynamic programming, and hence nat- 
urally uses a table to memoize results for sub-problems. Here, observational 
purity implies that the procedure always returns the same solution for a given 
sub-problem, whether a hit was found in the table or not. The appendix of a 
technical report associated with this paper depicts all the procedures mentioned 
above as created by us directly in Boogie’s language, as well as the invariants 
that we supplied manually (in SMT2 format). 

It is notable that the theorem prover was not able to handle the instances 
generated by the “existential approach” even for simple examples. The “impurity 
witness” approach, however, terminated on all the examples mentioned above 
with the correct answer, with the theorem prover taking less than 1s on each 
example. Please see [4] for more information about the examples used in our 
evaluation. 


7 Related Work 


The previous work that is most closely related to our work is by Barnett 
et al. [1,2]. Their approach is based on the same notion of observational purity as 
our approach. Their approach is structurally similar to ours, in terms of needing 
an invariant, and using an inductive check for both the validity of the invariant 
as well as the uniqueness of return values for a given argument. However, their 
approach is based on a more complex notion of invariant than our approach, 
which relates pairs of global states, and does not use a function symbol to repre- 
sent recursive calls within the procedure. Hence, their approach does not extend 
readily to recursive procedures; they in fact state that “there is a circularity - 
it would take a delicate argument, and additional conditions, to avoid unsound- 
ness in this case”. Our idea of allowing the function symbol in the invariant to 
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represent the recursive call allows recursive procedures to be checked, and also 
simplifies the specification of the invariant in many cases. 

Cok et al. [7] generalize the work of Barnett et al.’s work, and suggest classi- 
fying procedures into categories “pure”, “secret”, and “query”. The “query” pro- 
cedures are observationally pure. Again, recursive procedures are not addressed. 

Naumann [3] proposes a notion of observational purity that is also the same 
as ours. Their paper gives a rigorous but manual methodology for proving the 
observational purity of a given procedure. Their methodology is not similar to 
ours; rather, it is based finding a weakly pure procedure that simulates the given 
procedure as far as externally visible state changes and the return value are 
concerned. They have no notion of an invariant that uses a function symbol 
that represents the procedure, and they don’t explicitly address the checking of 
recursive procedures. 

There exists a significant body of work on identifying differences between two 
similar procedures. For instance, differential assertion checking [8] is a represen- 
tative from this body, and is for checking if two procedures can ever start from 
the same state but end in different states such that exactly one of the ending 
states fails a given assertion. Their approach is based on logical reasoning, and 
accommodates recursive procedures. Our impurity witness approach has some 
similarity with their approach, because it is based on comparing the given pro- 
cedure with itself. However, our comparison is stricter, because in our setting, 
starting with a common argument value but from different global states that 
are both within the invariant should not cause a difference in the return value. 
Furthermore, technically our approach is different because we use an invariant 
that refers to a function symbol that represents the procedure being checked, 
which is not a feature of their invariants. Partush et al. [9] solve a similar prob- 
lem as differential assertion checking, but using abstract interpretation instead 
of logical reasoning. 

There is a substantial body of work on checking if a procedure is pure, in the 
sense that it does not modify any objects that existed before the procedure was 
invoked, and does not modify any global variables. Salcianu et al. [10] describe 
a static analysis to check purity and Madhavan et al. [11] present an abstract- 
interpretation based generalization of this analysis. Various tools exist, such as 
JML [12] and Spec# [13], that use logical techniques based on annotations to 
prove procedures as pure. Purity is a more restrictive notion than observational 
purity; procedures such as our ‘factCache’ example are observationally pure, but 
not pure because they use as well as update state that persists between calls to 
the procedure. 
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Abstract. In this paper we address the challenge of cross-language clone 
detection. Due to the rise of cross-language libraries and applications 
(e.g., apps written for both Android and iPhone), it has become com- 
mon for code fragments in one language to be ported over into another 
language in an extension of the usual “copy and paste” coding methodol- 
ogy. As with single-language clones, it is important to be able to detect 
these cross-language clones. However there are many real-world cross- 
language clones that existing techniques cannot detect. 

We describe the first general, cross-language algorithm that combines 
both structural and nominal similarity to find syntactic clones, thereby 
enabling more complete clone detection than any existing technique. This 
algorithm also performs comparably to the state of the art in single- 
language clone detection when applied to single-language source code; 
thus it generalizes the state of the art in clone detection to detect both 
single- and cross-language clones using one technique. 


1 Introduction 


The clone detection problem has long been recognized by the community, with 
many existing papers exploring different techniques for finding clones amongst 
code written in a single language [5,13,14,21,22]. However, in recent years 
an interesting twist has arisen due to the rising popularity of cross-language 
libraries and applications: cross-language clones. Consider the parser genera- 
tor ANTLR [3], which has runtimes that are written in C#, C++, Go, Java, 
JavaScript, Python (2 and 3), and Swift. Also consider multi-platform mobile 
applications, which are often ported between Java and Objective-C or Swift, 
the languages used by Android and iPhone applications. In these kinds of set- 
tings, clones can actually cross language boundaries: a fragment of code in one 
language can be copied and massaged to conform to the syntax and seman- 
tics of another language. Existing single-language clone detection techniques are 
unable to effectively detect these sorts of cross-language clones. In this paper we 
propose a method to detect cross-language clones and demonstrate that it (1) 
finds cross-language clones that no existing method can detect; and (2) performs 
comparably to existing single-language clone detectors for finding clones within 
a corpus of single-language code sources. Therefore, our technique generalizes 
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Trees._findAllNodes = function(t, index, findTokens, nodes) { 
‘/ check this node (the root) first 
as (sana okens && (t instanceof TerminalNode)) { 
if(t.symbol.type===index) { 
nodes .push(t) ; 


} else if(/findTokens uk (t instanceof ParserRuleContext)) { 
(t. ruleIndex===index) 
nodes. push(t) ; 
} 


7/ check children 
for(var i=0;i<t.getChildCount();i++) { 


Trees._findAllNodes(t.getChild(i), index, findTokens, nodes); 
} 
J; 


template<typename T> 
static void _findAllNodes(ParseTree *t, size_t index, bool findTokens, std::vector<T> &modes) { 
// check this node (the root) first 


if_(findTokens && is<TerminalNode *>(t)) { 
TerminalNode *tnode = = dynamic_cast<TerminalNode *>(t); 


if (tnode->getSymbol()->getType() == index) { 
nodes. push_back(t) ; 


$ 
} else if (!findTokens && is<ParserRuleContext *>(t)) { 
ParserRuleContext *ctx = dynamic_cast<ParserRuleContext *>(t); 
if (ctx->getRuleIndex() == index) { 
nodes. push_back(t) ; 


} 

7/ check che laren. 

for (size_t i < t->children.size(); i++) { 

} * eindAlINodes (t> >children[il, index, findTokens, nodes) ; 


+ 


Fig. 1. A JavaScript (top) and C++ (bottom) clone pair doing a pre-order search. 


VerletParticle2D. Prototype. setWeight = function(w){ public void C wt 
this. weight weight = 
this. inv eight + = lnvveight. = =1f/w; 
(w !== 0) ? 1 / w : 0; //avoid divide by zero } 


3; 


Fig. 2. A JavaScript (left) and Java (right) clone pair setting the weight and inverse 
weight of a particle in a graphics application. A bug-fix has been applied to the 
JavaScript clone but not the Java clone. 


the current state of the art in clone detection by extending it to allow for both 
single-language and cross-language clone detection using a single technique. 

To make this problem more concrete, consider Fig. 1, which shows a real-life 
case (found during our evaluation described in Sect.6) of code clones involving 
C++ and JavaScript source code from the ANTLR parser generator [3]. To 
demonstrate the importance of finding cross-language clones, consider Fig. 2, 
which shows another real-life case (also found during our evaluation) of code 
clones involving JavaScript and Java in which a bug-fix has been applied to 
one of the clones but not the other. In addition, a quick search of the CVE 
(Common Vulnerabilities and Exposures) database yields a vulnerability due 
to incorrect message authentication checking that exists in multiple different 
language implementations of the relevant code [9]. 

There are only four existing papers that we are aware of that introduce 
new techniques for cross-language clone detection (discussed in more detail in 
Sect.2). That initial work has either focused on clones across languages that 
share a common intermediate representation such as .NET [1,15] or has deviated 
from classical clone detection and taken a more restricted, natural language- 
based approach, sometimes relying on assumptions that may not be met in real 
code [7,8]. None of that existing work would detect the clone examples given in 
Figs. 1 and 2 without extensive modification. 
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The main reason for these restrictions in previous work is that the syntac- 
tic structure (i.e., parse trees) of different languages can be extremely different 
even for code that, at the source level, seems similar. We demonstrate this phe- 
nomenon later in this paper. In order to overcome this problem, previous work 
has either restricted itself to languages with a common intermediate representa- 
tion (thus enforcing that the syntactic structure is similar for similar code) or 
abandoned structural matching entirely and looked only at the names of variables 
and other user-defined abstractions (what we call nominal clone detection). We 
observe that using purely structural or purely nominal matching is sub-optimal 
in a cross-language setting, in that each can yield both false positives and false 
negatives. 

Our technique consists of (1) a method for enabling structural matching for 
cross-language clones even in those cases where syntactic structure is different 
(Sect. 4); and (2) a method for composing both structural and nominal matching 
into a singular matcher, maintaining the strengths of each while mitigating their 
individual weaknesses (Sect.5). We have implemented our technique in a tool 
called FETT! that works at the granularity of function pairs; we use FETT to 
empirically compare our proposed technique against existing techniques (Sect. 6). 
We begin by describing related work and background information in Sect. 2 and 
giving a high-level overview of our technique in Sect. 3. 


2 Background and Related Work 


The concept of clone detection is not new, and the different techniques involved 
have been surveyed extensively [5,21]. Most existing non-semantics-based tech- 
niques can be categorized into the classes of “structural,” “nominal,” or “hybrid,” 
which we define below. 

Before we begin, there is a bit of misleading terminology in the literature: 
there exist many clone detection tools that are considered language-generic or 
language-agnostic (e.g., [22]), but can only be configured to work for programs 
written in a single language at a time. CCFinder [14], for example, can detect 
clones for six different programming languages; however, the user cannot (outside 
of naive text-only modes) truly cross language boundaries during a “language- 
generic” clone detection phase. 


2.1 What Exactly Is a Cross-Language Clone? 


Intuitively, we consider a cross-language clone to be the same as any same- 
language clone—two pieces of code that implement similar functionality—the 
only difference is the setting. We highlight here what kinds of clones our tool 
is able to find, and what kinds of clones we include in our evaluation based on 
their classification (i.e., Type I, II, INI or IV [24]). 


1 Our implementation is located at http://www.cs.ucsb.edu/~pllab under the 
“Downloads” link. 
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The usual code clone hierarchy does not translate well to a cross-language 
setting: type I and type II clones [24] may not exist across languages because 
of syntactic differences between languages (e.g., switch statements exist in C 
but not in Python). In this paper, we present methods that discover syntactic 
clones modulo the differences in language syntax, and we do this by creating 
a correspondence between related but different constructs. We do not consider 
semantic (type IV) clones that implement the same functionality in a different 
way (e.g., quicksort vs. selection sort). Readers familiar with the standard clone 
hierarchy can think of the clones that we find as type III clones generalized 
across languages. 


2.2 Structural Program Similarity 


Intuitively, two programs (or subprograms) can be considered similar if they look 
the same, disregarding identifier names—i.e., if their syntax trees have roughly 
the same shape. We refer to structural clone detection as the process of taking 
advantage of this similarity. 

Same-language clone detection tools usually also consider identifier data, 
and we are not aware of any purely structural cross-language clone detector. A 
notable same-language tool that operates via structural similarity is Deckard, 
which converts syntax trees into vectors for fast comparison [13]. 

Structural similarity is useful in all settings, but it is a hard problem in a 
multi-language setting—all the hybrid structural/nominal methods we describe 
below make some restriction on the languages involved. A major part of the 
novelty of our technique is a method for purely structural matching across lan- 
guages (though the final algorithm then combines structural with nominal (i.e., 
identifier-based) techniques for greater accuracy). 


2.3 Nominal Program Similarity 


Whereas structural similarity disregards identifiers and instead looks at code 
shape, nominal similarity does the exact opposite. Nominal similarity relies on 
the insight that similar code, especially copied and pasted snippets, will have 
the same identifier names throughout, regardless of code structure. 

Notable same-language clone detection tools that operate via nominal simi- 
larity are CCFinder and SourcererCC, which compare program tokens [14,25]. 


Across Languages. Cheng et al. describe CLCMiner [8], the first cross- 
language clone detection tool that does not require the languages involved to 
translate to the same intermediate form. It compares revision histories (diffs) 
in repository logs for cross-platform C# and Java programs; the tokens inside 
commits are used to compute similarity scores. CLCMiner is the basis for the 
Nominal algorithm defined in Sect. 5.1. 

Cheng et al. study a different notion of nominal similarity in [7], where they 
measure the effectiveness of token distributions in finding clones among cross- 
platform mobile applications; they obtain a negative result for identifier names 
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alone. Flores et al. [10] use natural language processing techniques to discover 
cross language clones at the function level. 


2.4 Hybrid Program Similarity 


It is logical to combine structural and nominal similarity methods, as the results 
they provide are complementary. A notable same-language, hybrid clone detec- 
tion tool is NiCad, which performs its comparisons at the parse tree level [23]. 
Syntax tree-based comparison is quite common [4,27]. 

Tree similarity is computationally expensive [6], and it is more efficient to 
linearize programs in some way; sequence similarity algorithms can then do 
the comparison. Existing same-language work compares the tokens in the order 
in which they appear in the parse tree [11], and we also take advantage of 
linearization of full parse trees in this work. 


Across Languages. Kraft et al. present C2D2 [15], the first cross-language 
clone detection tool, for C# and Visual Basic programs. This work requires that 
the languages involved be compiled to the same intermediate representation 
(IR)—.NET IR in this case. From a graph derived from that IR, they create 
sequences of tokens for subgraphs and use a Levenshtein distance-based token 
similarity algorithm to compare them. 

Al-Omari et al. build on Kraft et al.’s work and find clones by comparing 
CIL intermediate code text [1]. Again, they are restricted to .NET languages. 


This work. Our method is a hybrid method, works on any language with a 
grammar definition, and relies on just the source code (in contrast to, e.g., 
CLCMiner which requires the existence of revision history). We linearize pre- 
processed parse trees at the function level and compare the linearized sequences 
in a novel way that generalizes Kraft et al.’s work and incorporates features of 
Cheng et al.’s work. 


2.5 CLCMiner 


Our main comparison is with the only tool designed for cross-language clone 
detection and capable of handling arbitrary languages: CLCMiner [8]. We pro- 
vide further background on it here. CLCMiner is based on having the source 
code in a version control system, and requires a revision history by design. 
Section 5.1 gives a detailed explanation of our adaptation of CLCMiner. The 
original CLCMiner algorithm works on diffs and lexes them, whereas our ver- 
sion works on function parse trees. 

We were not able to obtain access to the original CLCMiner source code 
from the authors. In order to compare against this method, we implement our 
own version which adapts CLCMiner to work with the entire text of a function 
and have it calculate the distance metric above when given a function pair. Our 
new implementation may perform better or worse than the original (which uses 
revision history rather than function pairs) in certain cases. 
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We incorporate CLCMiner’s distance metric in a novel way in FETT, and 
show that our combination of structural and nominal information produces bet- 
ter results. As we have adapted CLCMiner’s algorithm to work on functions 
instead of diffs, it relies on having a parser to extract the functions and does 
not rely on a version control system. We refer to our nominal-only adaptation 
of CLCMiner’s algorithm as “Nominal” for the rest of the paper. 


3 Overview 


In this section we provide a high-level overview of FETT and provide justification 
for some of our steps. We give an end-to-end example of our clone detection 
process in our tech report [18]. FETT’s pipeline is: 


1. Take as input a corpus of source code (which may exist in multiple languages); 

2. Using existing ANTLR grammars, parse and create a separate parse tree for 
each function (we currently handle C++, Java, and JavaScript); 

3. Simplify parse trees that have an unnecessarily large depth; 

4. Abstract the multilingual parse trees into a common representation to facili- 
tate comparison; 

5. Linearize the resulting trees using a preorder traversal; 

6. Compare all linearized function pairs using a Smith-Waterman local sequence 
alignment algorithm; and finally 

7. Present the pairwise similarity scores to the user. 


The following sections fill in the details of the structural and nominal aspects 
of FETT’s cross-language clone detection process. 


4 Structural Clone Detection 


One key insight of our structural algorithm is that abstract syntax trees (ASTs), 
which eliminate details in the concrete parse trees about how exactly the input 
was parsed or what language it came from, tend to look more similar for similar 
code even across languages. Unfortunately, ASTs are not part of a language’s 
specification, and AST grammars and formats are implementation dependent. 
We are not aware of any single compiler that has frontends for the variety of 
languages that we compare. Our structural clone detection algorithm processes 
reduced parse trees (Sect. 4.1) to eliminate nonessential details about parsing and 
obtain a structure similar to ASTs. 

Another source of disparity between trees generated by two grammars is that 
the nonterminals are different. The other key insight of our structural algorithm 
is that abstracting reduced parse trees by putting nonterminals in equivalence 
classes (Sect.4.2) strikes a balance between preserving necessary information 
and smoothing out differences across languages. 

Our structural algorithm proceeds by extracting functions from an abstracted 
parse tree and then computes similarity scores between functions using the 
Smith-Waterman local sequence alignment algorithm. 
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Flattening a tree using a preorder traversal helps smooth out most remaining 
inconsistencies between inter-language reduced parse trees. To demonstrate the 
dissimilarities due to grammatical differences that preorder traversal removes, 
see Fig. 3: a grammar that uses nested if statements will have a parse tree like 
Fig. 3b, while a grammar that uses unnested if statements will look more like 
Fig. 3c. As the else if cases become more numerous in the first grammar the 
nesting becomes more severe, emphasizing the differences in the resulting parse 
trees. 


tf (exp ) block [else block] (G1) 
if exp : block [elif exp : block] * [else block] (G2) 


(a) Two different kinds of grammars for if statements. 
if 


n if if 
BA ids 


(b) An example parse tree using the (c) An example parse tree using the 
nested if grammar (G1). unnested if grammar (G2). 


Fig. 3. Grammars and parse trees for nested vs. unnested if statements. 


4.1 Precedence Woes 


Some grammar definitions encode operator precedence into the grammar’, 
whereas others use facilities provided by the parser generators to encode the 
precedence. Direct encoding of precedence causes spurious chains of nontermi- 
nals in the resulting parse tree, which would be removed when the parse tree is 
converted to an AST. We collapse the chains of nonterminals encountered in a 
parse tree for the direct encoding case to remove the chains and mitigate this 
disparity between different styles of grammars. Figure 4 demonstrates the kinds 
of issues that are apparent when a grammar hard-codes precedence—because 
precedence in this case appears in the form of nested productions, we always 
see “AdditiveExpression” even when there is only a multiplication expression 
present; this will throw off any clone detector that is working directly on plain 
parse trees. 

If precedence is handled indirectly through the parser generator, then the 
resulting parse tree is much closer to an AST. This is an example of an issue 
that only arises in a cross-language setting, and which makes cross-language 
clone detection strictly more difficult than same-language clone detection. We 
condense any chains of nonterminals, and we refer to the parse trees after this 
stage as reduced parse trees. 


2 We encountered this only in the C++ grammar during our evaluation. 


254 L. Nichols et al. 


CastExpression 
CastExpression 


Fig. 4. A subtree of the original C++ parse tree for the text “5*7”. 


4.2 Abstracting Parse Tree Nonterminals 


Consider the two reduced parse trees for the expression binarySearch (array, 
mid+i, high, x) in Figs. 5a and b. Although they look similar to the naked eye, 
because the node names are different, even a tree edit distance algorithm would 
say that the trees are not similar at all. We thus need to abstract the nonterminal 
names while preserving essential information about the tree structure. After 


performing this abstraction, we call the resulting parse trees abstracted parse 
trees. 


(a) Reduced parse tree (b) Reduced parse tree from a (c) Abstraction of the trees in 
from a Java parser . JavaScript parser . Figures 5a and 5b . 


Fig. 5. Reduced parse trees for expression binarySearch(array, mid+1, high, x) in 
Java and JavaScript, and their abstraction. The terminals are omitted for simplicity. 


Our method instead groups node types with similar meanings across lan- 
guages, so that node types that “mean” similar things are in the same group. 
To do this, we manually categorize node types into equivalence classes once 
per pair of languages. For example, consider the equivalence classes cy = 
{FunctionCall, ArgumentsExpression}, cg = {Primary, IdentifierExpression}, 
c3 = {ArgumentList, ExpressionList}, c4 = {NumericLiteral, Literal}, c5 = 
{AdditiveExpression} and the set C = {c1, C2, C3, C4, C5}. After replacing each 
node in Figs. 5a and b with its equivalence class in C, we end up with trees that 
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are exactly the same (Fig. 5c). In this specific example the abstracted trees are 
the same, though this is not always the case in practice. 

We define the abstraction algorithm in two parts: EqClassMapOf(C) pro- 
duces a map from each node to a symbol corresponding to its equivalence class. 
Abstract(tree, map) does the abstraction by traversing the given tree bottom up 
and applying the map. It removes the nonterminals which do not belong to any 
equivalence class. When the abstraction algorithm removes a node, it connects 
any children of the removed node to the removed node’s parent. 


4.3 Sequence Alignment for Clone Detection 


Linearizing the trees via a preorder traversal of the nodes will remove most 
traces of the structural differences demonstrated in Fig. 3. Moreover, the state 
of the art tree edit distance algorithms are not as scalable as sequence alignment 
algorithms’. These observations led us to explore sequence alignment algorithms 
as an alternative to tree-edit distance. Levenshtein distance is a popular choice 
in this category. Smith-Waterman is strictly more general than Levenshtein dis- 
tance, and it supports assigning weights to different elements in the sequence. 
Hence, we use the Smith-Waterman algorithm on preordered trees to compute 
similarity scores. We evaluate the precision and recall of both Smith-Waterman 
and tree edit distance in Sect.6 and observe that sequence alignment performs 
better in terms of precision and scalability. 

We convert function subtrees to sequences by computing the preorder traver- 
sal. Finally, we execute Smith-Waterman using custom weights on each sequence 
pair and normalize the resulting score using the normalization factor Z described 
below. We chose the weights based on the hypothesis that certain nodes like con- 
ditionals indicate important program structure, and should generally appear in 
the same order in a cloned pair of functions; therefore, we assign higher weights 
to penalize the function pairs in which this alignment does not occur. In the 
algorithm, the function SmithWaterman(a,b, M, g) computes a similarity score 
between two sequences a and b using the Smith-Waterman algorithm with sub- 
stitution matrix M and linear gap penalty coefficient g; a detailed explanation 
of these parameters can be found in [2]. 


Normalizing Smith-Waterman results. The result of the Smith-Waterman 
algorithm depends on the size of the input, and longer sequence pairs have 
higher scores. In order to find both short and long clones, we normalize the 
resulting similarity score from the Smith-Waterman algorithm to neutralize the 
bias towards longer clones. 

We define the self-similarity score of a sequence a as the score assigned 
to the pair (a,a) by the unnormalized Smith-Waterman algorithm; denote 
this score S(a). We normalize score assigned to a pair (a,b) by where 
Z = max {S(a),S(b)}. Note that Z is an upper bound for the score obtained 
by Smith-Waterman, and the score is equal to Z if and only if a = b. Thus, 


3 APTED, the state of the art tree edit distance algorithm has a time complexity of 
O(n?) [20] whereas the variant of Smith-Waterman algorithm we use is O(n”) [2]. 
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using the normalization factor 3 is useful if one is looking for similar whole 
functions rather than looking for a small snippet in a larger piece of code. 


5 Hybrid Algorithm 


Combining nominal and structural clone detection in a cross-language setting 
provides the best of both worlds, and mitigates any issues that running just one 
detection method might have. 

Identifier names carry some meaning about the programmer intent and give a 
code snippet context. On the other hand, structure of code (conditionals, loops, 
function calls etc.) also carry information about programmer intent. Without 
this structural information, we might misidentify two pieces of code as clones. 
Our hybrid algorithm is guided by structural information while consulting the 
Nominal algorithm to use local context within structurally similar pieces of code. 


5.1 Our Nominal Algorithm 


We have adapted CLCMiner’s algorithm to work on functions as our purely 
Nominal algorithm. For a given pair of functions (fı, f2), our nominal matching 
algorithm consists of two parts. 

The first part takes a function f, removes the comments and splits the tokens 
on each non-letter character (such as underscores or dashes). It then splits the 
camel case tokens into words and converts them to lowercase—each function 
becomes a bag of words that is represented by a characteristic vector, which holds 
the number of occurrences of each word. We denote the resulting characteristic 
vector as v( f). 

The second part of the algorithm computes a normalized distance ea ae 

VU1—vV2 
= Tortel 
where ||-||, is the 41 norm (i.e., the sum of the absolute values of every entry in 
the vector). This algorithm computes a distance between two given functions; to 
make it comparable to the other algorithms, we use 1 — d(v1, v2) as a similarity 
score. 


two characteristic vectors v1, v2 according to the formula d(vz, v2) 


5.2 Full Algorithm 


Our full algorithm is provided in our tech report [18]. It is a combination of the 
structural and nominal algorithms: we linearize the parse trees, and consecutive 
terminal nodes become bags of words. Nonterminals are compared using our 
structural method, and bags of words are compared using our nominal method. 


6 Evaluation 


In this section we compare our work against existing work on both cross-language 
and same-language clone detection. 
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6.1 Implementation and Environment 


We have implemented our tool FETT in Scala and used the ANTLR parser 
framework as its front end, so that any language with an ANTLR grammar can 
be easily connected. 

To test whether FETT can handle same-language clone detection with similar 
accuracy as specialized, language-specific tools, we configured NiCad 4.0 [23] to 
work at the function-level granularity and experimented with configurations until 
we found the best-performing one for our tests*. 

Because we are comparing parse trees, we also want to determine how 
well we compete against the state-of-the-art tree edit distance algorithms, thus 
we compare one data set with APTED [19,20]. We normalize the similarities 
using the method described in [17], and, as this normalization method requires 
a metric distance, we could not introduce weights for matches. We can still 
weight mismatches, though. We found that the parameters mismatch =1, dele- 
tion = insertion = 5, match =0 gave us the best results overall. 

We chose the threshold for ignored functions (defined in Sect. 4.3) to be 
0 = 35 for every experiment, and the exact tolerance parameters are given below 
for each case. We used the same set of equivalence classes with the same weights 
for all cases: conditional, loop, return, and function call were all weighted 5; 
assignments were weighted 2; and all other considered nodes were weighted 1. 

Our experiments were run on a computer with an Intel i7 4790 3.6 GHz 
processor. FETT, Structural, Tree Edit Distance, and Nominal were given 8 GB 
maximum heap size and were set to use 4 threads. 


6.2 Methodology 


We used the standard statistical metrics of precision, recall, and F-measure to 
quantitatively assess the effectiveness of our different techniques. 

Due to the sheer amount of possible clone candidates in large projects, it 
is difficult to manually obtain complete ground truth for clones in real-world 
programs. Hence, we created two separate data sets for evaluation: 


Manual programs set (handwritten set). We implemented a set of small 
programs in different languages to create a setting in which we have complete 
knowledge of whether a pair of functions are clones. Statistics about the code 
are in Table 1. 


Randomly sampled program set (large set). We chose four libraries that 
have implementations in different languages and set the tolerance parameters” 
defined in our algorithm (see [18]) to give the best results on a per-language 


4 NiCad: threshold =0.5, minsize=4, maxsize = 2500, rename = blind, filter = none, 
abstract = none, normalize = none. 

For FETT: u = 6 (match coefficient) and g = —4 (gap penalty) for the case of compar- 
ing Java and JavaScript, and (u, g) = (9,—-1) for Java/C++ and JavaScript/C++4, 
and (8,—3) for Java/Java. The nominal multiplier was set to 2 for all but the 
Java/C++ and JavaScript /C+-+ cases, where it was set to 3. For the Structural algo- 
rithm: (7,—1) for JavaScript/Java, (8,—4) for Java/C++, (0.5,—2) for Java/Java, 
and (9, —4) for JavaScript /C++. 


5 
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Table 1. Statistics of handwritten clones. 


Language Pair LoC #Functions #Pairs #Clones 


Java 201 12 
132 11 

JavaScript 177 11 
Java 201 12 144 12 

C++ 195 12 
JavaScript 177 11 132 11 

C++ 195 12 


pair basis. We randomly sampled functions from the files with the same names 
(ignoring extensions) and manually checked the pairs to create a sample with 
ground truth—this is essentially the sampling strategy used by Cheng et al. [8] 
applied to functions instead of diffs. We chose to reuse this sampling strategy 
due to the manual nature of our evaluation, and because we only possess finite 
human resources; it does not reflect the true distribution of clones, as function 
clone pairs are unlikely to be chosen in a standard uniform random sample— 
had we gone that route, our precision and recall scores would not have been 
meaningful. We are not aware of a better solution to this problem. 

The first three libraries considered for this set are: the ANTLR parser frame- 
work, version 4 [3]; the toxiclibs computational design library [26]; and the ZXing 
barcode image processing library [28]. We also considered two ports of the LAME 
MP3 encoding library in different languages that were ported by different devel- 
opers to assess the efficacy of clone detection tools in such a scenario: lamejs, a 
JavaScript port [16]; and java-lame, a Java port [12]. Statistics about the libraries 
are in Table 2. 


Table 2. Statistics of libraries considered for evaluation. LoC: non-blank non-comment 
lines of code, Fun’s: # of functions found in each project, Nont’l (Nontrivial) Fun’s: 
# of functions whose reduced parse trees are > 0 (the chosen threshold), Pairs: the # 
of possible fun. pairs, Same-File Pairs: # of pairs of functions coming from files with 
the same name (ignoring extensions), Sel’d: # of selected pairs, Runtime: total time 
(H:M:S) to run our method. 


Data set Library Lang. Pair LoC  Fun’s Nont’l Fun’s Pairs Same-File Pairs Sel’d Runtime Clones 
Java 13,770 1,393 694 


antlrj ANTLR 240,471 4,942 505 0:56:18 14 
Java 13,770 1,393 694 

antlrjsj ANTLR Java — 1aw70 14393 6594 281,070 6,240 663 0:25:01 45 
JavaScript 7,323 728 405 

antlrcppjs ANTLR QEF 19766.1222 Bet 194,400 3,762 752 0:17:11 17 
JavaScript 7,323 728 405 

toxic toxiclibs axe SOE ee sabe 5,004,076 11,637 1,060 3:01:12 63 
JavaScript 36,976 4,108 2,321 

axing ZXing Java 38,908- 21690 1,689 684,045 1,388 254 2:10:51 45 
C++ 22,784 866 405 

naya 5 
iames deverlamc: “Weve: 2M990 3973 296 101,152 4,645 873 0:27:37 34 


lamejs JavaScript 11,112 285 232 
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6.3 Results 


For our main set of tests, we compare FETT against (1) our purely Structural 
algorithm (i.e., no token similarity), and (2) our Nominal algorithm. We also 
apply the APTED tree edit distance algorithm combined with our abstraction 
method on our handwritten data set; tree edit distance takes at least an order 
of magnitude longer than the other tools, and we did not evaluate the large data 
set using tree edit distance because of this and due to its poor performance on 
the handwritten tests. We use NiCad on the Java-Java same-language case of 
our large data set. 


Cumulative clone ratios. We look at the graphs of cumulative clone distri- 
butions to choose a good cut-off point for each of the three techniques. These 
graphs were originally used in [8], and they are meant to give an intuition about 
where a clone detector separates clones from non-clones. 

Similarity vs. cumulative clone ratio graphs track the ratio of clones to non- 
clones as the similarity score varies from 1.0 to 0. For example, at point 0.4 
on the similarity axis, we plot the ratio of clones to non-clones of all samples 
with similarity scores > 0.4. A successful clone detector would have a similarity 
value at which there is a significant drop in this ratio, and that would create 
the optimal cutoff point. A clone detector may not assign very high scores to 
any pairs based on its similarity metric; in such cases, we start the plot from 
the first nonempty bin. Figure 7 shows the cumulative clone ratios for antlrj and 
toxic; graphs of other test cases are omitted because of space constraints, but 
they are of similar overall shape. We chose a cutoff point for each clone detector 
based on the drops from these graphs (e.g. we chose the cutoff point of 0.4 for 
FETT’s Java/Java case). The relative shape of the graph is more important than 
absolute scores—squishing or stretching the similarity scores only affects the 
choice of the optimal cutoff point. 


Handwritten test set. When evaluating the manually created (handwritten) 
data set, we used the same parameters u = 7, g = —2 overall for all pairs of func- 
tions in the data set and considered the combined results for both FETT and the 
Structural algorithm. FETT had its nominal multiplier set to 2. Figure 6 shows 
the clone distributions of different clone detection methods for the handwritten 
program set; and precision, recall, and F-measure (harmonic mean of precision 
and recall) for this set are given in Table 3. FETT and the Structural algorithm 
had a cutoff of 0.5, and the Nominal algorithm’s cutoff was 0.6. 


Handwritten test set discussion. The table and the figures paint a similar 
picture. Both FETT and the Structural algorithm seem to perform the best 
on this data set—the graphs for the higher similarity scores have a high clone 
ratio, and there is a sharp decline visible in both graphs as the similarity score 
is allowed to lower. The Nominal algorithm has a less sharp drop, and this 
indicates that it is assigning mid-range similarity scores with low precision. It is 
also notable that tree edit distance does so poorly; we believe that this is because 
we are not allowed to give weights to matches, as described above. 
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Fig.6. Cumulative clone ratio 
distribution for handwritten pro- 
grams. Results of FETT and struc- 
tural coincide. 


Large test set. We now present and discuss all the cross-language results for 
our large test set. The same-language case is different from the cross-language 
cases, so the reader is asked to consult Fig. 7b, which is indicative of all the 
cross-language cases, and not Fig. 7a. 

Cutoffs were chosen on a per-language pair basis that maximized a given 
tool’s score. For FETT, for the three JavaScript/Java test cases and the 
Java/C++ test case, we used a cutoff of 0.4, and the rest used a cutoff of 0.5. 
For the Structural algorithm, we used a cutoff of 0.6 for JavaScript/Java, 0.5 
for Java/C++ and JavaScript/C++, and 0.4 for Java/Java. For the Nominal 
algorithm, we used a cutoff of 0.5 for JavaScript/C++, and 0.6 for the rest. 

Figure 8 shows precision, recall and F-measure of all the tools we compared 
for each data set and provides a visual and quantitative assessment of efficacy 
of all the techniques. 


Large test set discussion. Clone ratios relate most closely to the precision 
scores for each data set, and from the results it appears that the Structural 
algorithm generally has the upper hand in this area—applying the intuition 
described above, we see that the Structural algorithm seems to cut off at the 
sharpest angle in most cases. It makes sense why this is the case, as pieces of 
code that look similar across languages are generally prime candidates for clones. 
Precision is of course not the whole story. It is clear that FETT is able to take 
the best of both the nominal and structural worlds, and the F-measure is always 
the highest. When it comes to Structural’s results, the toxiclibs case is an outlier, 
where we found that there were more cases of the structural differences; FETT’s 
hybrid structural/nominal algorithm was able to make up for this, though. 


Same-language test case. To assess performance on same-language clones, 
we compared our tool with NiCad on the Java version of ANTLR. Returning to 
the same figures, the antlrj case is quite similar to the other language pairs in 
terms of precision, recall, and F-measure, which demonstrates that our tool is 
capable of holding its ground in a same-language setting. 

FETT performs slightly worse (by one percentage point in terms of F- 
measure) than NiCad. This result is not surprising because NiCad uses more 
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Fig. 7. Similarity vs. cumulative clone ratio for the samples from the large open-source 
program set. 
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Fig. 8. Precision, recall and F-measure of clone detection tools on the large program 
set. 


information about the code whereas we deliberately discard some information 
by abstracting parse trees to work in a cross-language setting. Even with our fil- 
tering of parse trees, FETT’s F-measure score is very close, and this shows that 
our tool is capable of producing similar results to a dedicated same-language 
tool. 


Overall results. We observe that the FETT’s hybrid algorithm, in terms of F- 
measure, outperforms both the Nominal algorithm and the Structural algorithm 
consistently in our large test set experiments. 


Limitations. FETT may have difficulty scaling to repositories with large num- 
bers of large functions—a run of FETT on the entire toxiclibs library (comparing 
every function pair, not just same file pairs) takes 5.13 h—and so further improve- 
ments will be required to enable such a target. One possible future direction for 
improvements could be to develop semi-automated solutions where we have the 
user use her domain knowledge and pick out the files or functions to compare 
beforehand, or the user can prune the search space by telling the tool which 
modules are unrelated. 


7 Conclusion 


We have presented FETT, a hybrid structural/nominal clone detection method 
that is capable of operating across programming languages and that is generic in 
the sense that it does not require any languages involved to belong to the same 
language family. It is syntax-based, uses ready-made grammar specifications, and 
requires minimal manual effort—the keys to the process are syntax abstraction 
and sequence alignment. We have provided a two-part evaluation of FETT, and 
we empirically demonstrate on multiple test sets that FETT is accurate in terms 
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of the standard metrics of precision and recall. We also confirm that our method 
is on a par with previous work when it comes to same-language clone detection, 
thus proving that it is strictly more general than single-language methods. 
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Abstract. In the Matlab Simulink environment, systems can be mod- 
elled using Simulink block diagrams and Stateflow state charts. While 
stateful logic is more naturally modelled using Stateflow, in practice com- 
plex block diagrams are often used instead, resulting in models that are 
hard to understand and maintain. In order to improve the maintainabil- 
ity and understandability of large industrial models, this paper presents 
a strategy for refactoring Simulink block diagrams implementing stateful 
logic into functionally equivalent Stateflow state charts that more nat- 
urally represent the intended behaviour. To bridge the gap between the 
syntax of block diagrams and state charts, Mealy machines represented 
by tabular expressions are used as an intermediate representation. The 
compositional language of block diagrams is used to combine tables mod- 
elling individual blocks into a table for the entire block diagram which 
describes the high level state machine encoded in the Simulink subsys- 
tem. A prototype tool that performs the translation from Simulink to 
Stateflow automatically is discussed. 


Keywords: Simulink - Stateflow - Refactoring - Mealy machines - 
Tabular expressions - Monoidal categories 


1 Introduction 


The adoption of Model-Based Design in the development of embedded control 
systems across industries has led to the wide use of Matlab/Simulink/Stateflow 
as a supporting environment. The modelling capabilities provided by Simulink 
block diagrams and Stateflow state charts complement each other by providing 
languages for functional and stateful system specifications. Due to their individ- 
ual strengths, one modelling formalism may be preferable for specifying certain 
classes of behaviours. For example, the MathWorks Automotive Advisory Board 
(MAAB) guidelines [25] advise the use of Stateflow over Simulink for modelling 
stateful logic. This is because Simulink block diagrams that are used to model 
mode switching logic are often cumbersome and difficult to understand. In this 
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case, Stateflow state charts should be used to implement the same logic resulting 
in a structure which is easier to read, maintain, and verify. 

For example, each model in Fig. 1 executes periodically to update its state 
and outputs. When the block diagram in Fig. la updates, each signal line is 
given a value and each block uses the values of the incoming signals to determine 
the values of the outgoing signals. When the state chart in Fig. 1b updates, it 
checks each condition on transitions leaving its current mode (i.e. state node). 
If a condition is satisfied, the state chart transitions to the associated target 
mode and executes the exit actions of the mode it is leaving, the actions on the 
transition it is taking, and the entry actions of the mode it is entering. If no 
transitions are valid, the state chart remains in its current mode and executes 
the during actions of that mode. 


aie 
Decrement >l ER {counter = 0.0;} 
Add 
H i H >o +c 
10 HT i 
a AN Pio counter IsRunning “9 | running = false; = i 
Œœ counter = 0.0; counter = counter - 1.0; 
Mode exit: exit: 
sat 0 aF running = false; running = true; 
counter = 10.0; = counter = counter - 1.0; 
Zero SetCounter [counter <= 1.0] 
a) Before: Simulink Block Diagram (b) After: Stateflow State Chart 


Fig. 1. Model of a timer in Simulink and Stateflow. 


The Simulink and Stateflow models shown in Fig. 1 are functionally equiv- 
alent. Both models capture a timer with one boolean input, start, and one 
boolean output, running. When start becomes true, the system starts count- 
ing down from ten to zero. While the system is counting down, running is true. 
Once the counter reaches zero, running is set to false and becomes true again 
if start is true. Although there are relatively few blocks in Fig. la, it is difficult 
to understand how this model achieves the behaviour while the state chart in 
Fig. 1b clearly captures the system’s modes and the conditions triggering mode 
changes. 

Our industrial experience has identified the need to refactor Simulink block 
diagrams to Stateflow state charts for easier comprehension and maintenance. 
More precisely, practice shows that Simulink is often used to specify stateful logic 
even though Stateflow would be a more appropriate implementation language. 
This might occur during model evolution when modes of operation are added 
to previously mode-free block diagrams, and developers find it easier to modify 
the existing Simulink logic to accommodate the change than to reproduce the 
behaviour from scratch in a state chart. Other times, a developer’s preference 
dictates the choice of modelling formalism. Manual refactoring from Simulink 
to Stateflow, although feasible, is a time consuming and error prone process 
which requires that the behaviour of complex Simulink models is completely 
understood. 
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This paper presents an approach to translate block diagrams into 
behaviourally equivalent state charts. The approach converts individual blocks 
into tabular expressions [21] to expose their latent state variables and decision 
logic. The data flow between blocks is then used to combine tables into a single, 
larger table describing the entire block diagram. Then, the elements of state 
charts (states, transitions) are identified by reconfiguring the combined tables 
into a form similar to state charts. Behavioural equivalence is established by 
giving semantics to block diagrams, state charts, and the intermediate tables as 
Mealy machines. The paper’s main contributions are: (i) A method for translat- 
ing Simulink block diagrams to Stateflow state charts via tabular expressions. 
(ii) A categorical framework for composing Mealy machines by combining their 
update functions as the basis of the translation. (iii) A prototype tool imple- 
menting the translation from Simulink to Stateflow. 

This paper is organized as follows. Section2 describes how we model sys- 
tems and our categorical framework for combining them. Section 3 illustrates 
the translation method with a simple example. Section 4 describes the applica- 
tion of the categorical framework to convert block diagrams to tabular expres- 
sions. Section 5 explains how tabular expressions are converted to state charts. 
Section 6 describes the prototype tool. Related work is covered in Sect. 7 and the 
paper concludes with Sect. 8. 


2 Background: Modelling Systems and Their 
Combinations 


This section describes the formalisms underlying the proposed translation app- 
roach: Mealy machines, tabular expressions, and monoidal categories. 


2.1 Mealy Machines: Modelling Stateful Systems 


To preserve behaviour, the semantics of both block diagrams and state charts 
are modeled using Mealy machines. 


Definition 1. A Mealy Machine m is a tuple (S, so, X, A, ud), where S is a 
set of states (the state space), so E€ S (the initial state), X is a set of input 
values (the input alphabet), A is a set of output values (the output alphabet), 
and ud: X x S — Ax S is a function (the update function) which computes the 
current output and next state from the current input and current state. 


For example, the unit delay 1 block labelled counter in Fig. la can be mod- 
elled as the Mealy machine delay = (R,0, R, R, shift). The block has an input 
variable (port) i, an output variable (port) o, and an internal state variable 
counter, where i,o, counter € R. When the block updates, it outputs the cur- 
rent state value o = counter, and updates the state to store the current input 
value counter’ = i, i.e. (0, counter’) = shift(i, counter), where shift : R? > R? 
is defined as shift(i, counter) = (counter, i). 
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While Simulink has no formal semantics, our use of Mealy machines to model 
their behaviours is consistent with the informal semantics described in Chap. 3 
of the Simulink User Guide [26]. 


2.2 Tabular Expressions: Representing Conditional Behaviours 


Both block diagrams and state charts can specify decision logic, but in rather dis- 
tinct ways. We unify the presentation of decision logic in the two formalisms using 
two similar forms of tabular expressions: horizontal condition tables (HCTs) as 
presented in [28]; and state transition tables (STTs), which specialize HCTs to 
describe state charts similarly to the ones presented in [24]. 


running counter’ Source Condition running counter’ Target 
start counter > 0|| true |counter —1 Running counter —1>0]| true |counter — 1| Running 
counter < 0|| false 10 counter — 1 < 0|| true |counter — 1| Stopped 
chard counter > 0|| true |counter — 1 Stonned start false 10 Running 
counter < 0|| false 0 astart false 0 Stopped 

(a) Horizontal Condition Table (b) State Transition Table 


Fig. 2. Intermediate representations 


An HCT is represented in Fig. 2a. It is a tabular representation of the update 
function of a Mealy machine which models the block diagram from Fig. la. Given 
the variable values start = true and counter = 0, the table can be evaluated 
from left to right in the following way. Since the first condition start of the first 
column is satisfied, and the sub-condition counter < 0 in the second row of the 
second column is satisfied, we use the second row to determine that running is 
given the value of false, and counter’ is given a value of 10. 

The second tabular representation, STTs, are also used to represent the 
update function of Mealy machines. Their special format closely matches the 
state charts they model. For example, the STT in Fig. 2b represents the state 
chart in Fig. 1b. Each mode is listed in the first column, and the condition of 
each transition is listed in the second column, adjacent to the mode they leave. 
The columns after the double bars describe how each output/state variable is 
updated by the actions of the associated transition. The final column of each 
row indicates which mode the associated transition leads to. 

Tabular expressions were given a precise semantics in [10]. The structure 
of tables can be rearranged without changing the function they describe, e.g., 
conditions can be reordered as in [4]; conditions can be combined with sub- 
conditions (via conjunction) to flatten the hierarchy of conditions; and normal 
expressions in the table can be simplified by assuming the conditions to their 
left hold. 
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2.3 Categorical Framework: Combining Systems 


The key idea of block diagrams is to combine simple, predefined blocks to 
describe a behaviour. The language of monoidal categories explains how to break 
down the complex data flow of block diagrams and describe it in terms of simpler 
data flow [5] (i.e. cascading blocks in sequence, placing blocks in parallel, and 
feeding outputs of blocks back to their inputs). 

Monoidal categories describe data flow in an abstract setting where blocks 
are called morphisms. Simple data flow constructs are described as operations 
on morphisms, which can be visualized using block diagrams called string dia- 
grams [5,22]. In this section, we discuss the wiring constructs in the concrete 
setting of the category Set, where morphisms are functions from an input set 
of tuples to an output set of tuples (called the domain /codomain objects of the 
morphism). 


‘| ada 


“swe 


Fig. 3. Functional fragment of timer example 


A fragment of the block diagram from Fig. 1a can be used to illustrate the 
idea behind the basic data flow operations. The string diagram in Fig. 3 describes 
a function that is broken down into sub-functions combined via two operations: 
sequential combination (denoted “;”) and parallel combination (denoted “®”). 
The fragment describes a function g from R x B to R. Each wire extending from 
the left/right of the large compound function indicates an input/output value, 
respectively. The wire is labelled with the set from which the value comes. If 
there are multiple wires, the domain or codomain of the function is given as the 
Cartesian product of those sets. In monoidal categories, the Cartesian product 
is generalized as an operation called the monoidal product on objects. 

The function g is composed of a sequence of sub-functions, g = f1; fo; f3; fa. 
The sub-functions (except for f4) consist of functions composed in parallel with 
wires and other functions. The wiring “data routing functions” are then defined 
as follows: a normal wire is the identity function idx = {(x) + (a)}; wires 
crossing over each other define the braiding function Br4.p = {(a,b) — (b,a)}; 
and branching wires are called the diagonal function Ax = {(x) + (a,x)}. The 
functions are indexed with the set(s) over which they are defined. Morphisms 
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like these functions have special status in monoidal categories and must satisfy 
some axioms to verify that they “act like wiring” in the host category. 

Sub-function f3 can now be described as f3 = add & idr ® idr. Functions 
combined in parallel have domains/codomains which are the Cartesian products 
of the domain/codomain of the component functions. The parallel combination 
uses each component function independently to calculate each component of 
the output. For example, taking add = {(x1, £2) œ> (xı + £2)}, the function 
add ® idr ® idr is given by {(£1, £2, £3, £4) > (£1 + £2, £3, %4)}. In monoidal 
categories this operation is generalized as the monoidal product on morphisms, 
where the domain/codomain of a product morphism is given by the monoidal 
product of the domain/codomain objects of the component morphisms. It is 
notable that we can also describe sub-function f3 as fg = add © idg2, where 
the two wires are treated as one function. This is useful, for example, when 
describing the sub-function fə as fo = Brr? r 8 swp. 

Describing fı requires modelling constant blocks as functions. Therefore, 
constants are described as functions with inputs from the singleton set 1 = 
{()}, and we draw functions with domain/codomain 1 as blocks with no wires 
extending from the left/right side, respectively. Functions modelling constant 
blocks, [k] = {() > (k)}, always take the empty tuple as input, and always 
produce the same value k as output. The function fı can now be described 
as fi = Ar ® [-1] © [10] @ idg @ [0]. Objects like 1 have special status in 
monoidal categories and are called the monoidal unit. Taking their monoidal 
product with any other object X yields the same object X. Intuitively, this 
means that concatenating any tuple (2,..,2,) with the empty tuple () does 
nothing. This explains why the product of the domains of the functions in fı is 
the set R x 1 x 1 x B x 1, but the domain of fı is described as R x B—the 
former simplifies to the latter. 

We now describe the entire function g in terms of simple data flow: 


g = (Ar ® [1] 8 [10] ® ids ® [0]); (Brr? R Q swp); (add ® idr ® idr); swr 


However, this example does not contain feedback loops. Loops are obtained 
when inputs and outputs of a function are connected by some common wire(s), 
such as the wire connecting the first input and first output of the inner box in 
Fig. 4a. Adding looping wires to a function f : X x A — X x B yields a new 
function f* : A > B (e.g., the outer box in Fig. 4a) where f*(a) = b if there 
exists a unique z € X such that f(x,a) = (a,b). When such an g exists for 
each a € A, the loop configuration is considered well-formed. Following [11], we 
encode the addition of such loops with a trace operation: Trž a(f) = f*. 

For example, consider the function f = {(x,y) + (x + x,x + y)}. In the 
function Tre r( f) the trace applies the constraint that the first input is equal 
to the first output (i.e. x = x + x) to which there is a unique solution: x = 0. 
Given any y € R, f(0,y) = (0,y), therefore Trè a(f) = {(y) © (y)}. This 
approach uses fixed point equations to specify traces, which is generalized by 
the approach from [8]. Since these fixed point equations are not guaranteed to 
have a unique solution, the trace operation is partial—it is only defined for loop 
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configurations that are well-formed. Partial traces have been described in [15], 
and the guarded structure introduced in [7] compositionally describes which 
feedback configurations are valid. For the loops to “act like wiring”, certain 
axioms must be satisfied, e.g., the yanking axiom (as shown in Fig. 4b) states 
that Tr% x(Brx,x) = idx for any set X. 


(b) Yanking: Tr% x (Brx,x) = idx 


Fig. 4. String diagrams for traced categories 


3 Translation Strategy 


The translation strategy is composed of three steps. This section illustrates these 
steps by considering the example from Fig. 1. 

First, the decision logic implemented by the block diagram is encoded as 
the HCT in Fig. 8a. This step is described in Sect.4. In the second step, the 
representation is simplified as, depending on the value of counter, only some rows 
of the table can be valid. By associating a certain range of state variable values 
with a mode of operation, we simplify the representation by considering only 
the conditions which are possible. This allows us to leverage the conditions from 
HCTs to determine the modes of operation by rearranging HCTs into equivalent 
STTs such as Fig. 2b. The final step trivially rearranges the information from 
STTs into a state chart by creating a transition for each row. The conversion from 
HCTs to STTs to state charts is described in Sect. 5, and possible simplifications 
to the resulting state chart are discussed. 

Even with such a simple example, the importance of automated refactoring 
becomes apparent. If the model were to be refactored manually, a state chart 
that is not equivalent to the block diagram could be created unintentionally. 
For example, one can manually produce a state chart that transitions out of the 
Running mode when counter is zero, rather than one. 


4 Block Diagrams to HCTs: Mealy Composition 


The first step of the translation strategy is to model the entire block diagram 
as a Mealy machine whose update function is represented as a HCT. To achieve 
this, Simulink block diagrams are modelled in a category Mealy, where mor- 
phisms (i.e. blocks) are Mealy machines, not functions. We then show how 
the update functions of composite Mealy machines built from the operations 
described in Sect. 2.3 can be built from the update functions of the component 
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Mealy machines using the same operations on functions. Then, the predefined 
update functions of individual blocks can be represented using HCTs and com- 
bined according to the functional combinations derived from the block diagram. 


4.1 Mealy Machines and Their Combinations via Functions 


In this section, we consider a category Mealy whose objects are sets, and whose 
morphisms m : X — A are Mealy machines with input alphabet X, and output 
alphabet A. Composition of morphisms is given by the usual definition of cascade 
composition of Mealy machines [13]. We also introduce a monoidal product, 
giving the category a monoidal structure. It is defined on objects as the Cartesian 
product of sets, and on morphisms as the parallel composition of Mealy machines. 
The unit of the monoidal product is the same as for sets, the set containing one 
element: 1. Considering equality of morphisms up to bisimilarity results in a 
structure similar to the one used in [9] to describe symmetric lenses—according 
to [9], this structure forms a (symmetric) monoidal category. 

While the cascade/parallel composition of Mealy machines is well understood 
(see, e.g. [13]), we introduce a definition for the update functions of the composed 
machines which wires together the update functions of the individual machines. 
Because string diagrams are used to represent both Mealy machines and their 
update functions, let us introduce some graphical notation to differentiate them. 
For Mealy machines, the string diagrams use black boxes to denote component 
Mealy machines (e.g. Fig. 5a). The update function ud of a Mealy machine m can 
be expressed using the projection mapping [m]ua = ud. For update functions, 
the string diagram is decorated with grey backing to group the inputs/outputs of 
the update function into two main components: the upper components describe 
the inputs/outputs to the Mealy machine, and the lower components describe 
the current/next state (e.g. Fig. 5d). 


(d) [mi; ma]iua (e) [mi 8 ma]ua (£) [TS a (m)]ua 


Fig. 5. Composite Mealy machines and their update functions 
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Two Mealy machines mı = (S1, s4, X, O, ud1), and mz = (S2, 82, O, A, ud2) 
can be composed in sequence as illustrated by Fig.5a to form the composite 
Mealy machine m1; M2 = (S1 x S2, (sf, 8%), X, A, ud’). The update function ud’ 
for m1; ma with the string diagram in Fig. 5d, is defined as: 


m1; Molua = ([Mi]ua 8 ids, ); (ide ® Brg, 5); ([Me]ua ® ids, ); (tds ® Brg,,s,) 


The parallel composition of mı and mz is the Mealy machine mı ® mz = 
(S1 x S2, (s$, 82), X1 x X2, Ay x A2, ud’) as illustrated by Fig. 5b. The update 
function ud’ for mı ® mg, with string diagram Fig. 5e, is defined as: 


M1 8 Məļua = (ids, @Bry,,s, @ids, ); ([Mi]ua® [Ma]ua); (ida, ® Brg, a, @ids,) 


Feedback configurations of Mealy machines (e.g., Fig. 5c) can be defined with 
fixed-point equations, such as in [13]. We give an equivalent description in terms 
of the trace operation in Set. A Mealy machine m = (S, so, O x X, O x A, ud) can 
be traced to form the machine Tr 4(m) = (S, so, X, A, ud’) where the update 
function ud’ is defined as [TS a(m)]ua = TS ys axs (imua) as illustrated by 
Fig. 5f. Since this operation is defined in terms of traces in Set, many of the 
properties of traces can be derived from traces in Set. 

The above results mean that if we know the update functions of individual 
Simulink blocks, then we can model the update functions of block diagrams 
which configure those blocks in sequence, in parallel, and with feedback. 


4.2 Functional Embedding and Wiring Morphisms 


In this section, we address the fact that a large part of a Simulink block diagram 
looks very functional (i.e. stateless). For example, many of the blocks and wiring 
in Fig. la can be modelled as functions. For this reason, we consider a class 
of Mealy machines which produce outputs as a function of only their current 
inputs. Any function f : X — Y can be described as the Mealy machine M f = 
(1,0, X,Y, f), with one state, and update function f (see Fig. 6a). The mapping 
M embeds morphisms from Set into the category Mealy, because any two 
embedded functions Mf and Mg interact in Mealy very similarly to the way 
they interact as functions in Set. 


1 [Mswe Jua [delay]ua 


(a) [IM flua = f (b) Mswr; delay (c) [Mswr; delay]ua 


Fig. 6. Embedded functions and their interactions 
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This explains how functional aspects of Simulink block diagrams can be mod- 
elled with Mealy machines. For example, the block labelled Mode in Fig. la can 
be modelled with the Mealy machine Mswpg. Perhaps more importantly, the 
morphisms introduced to describe wiring in functional diagrams (i.e. idx, Ax, 
Br4,g) can again be used to describe the same (functional) wiring for Mealy 
machines. Therefore, in string diagrams representing Mealy machines, plain wires 
represent the morphism Midx which carries data without changing it, branch- 
ing wires represent the morphism MAy which duplicates data, and crossing 
wires represent the morphism MBra,g which reorders the components of data. 
The fact that Midx and MBra,g “act like wiring” is established in [9]. 

This establishes how to model wiring and functional blocks in Simulink block 
diagrams as Mealy machines. We can now use the operations from Sect. 4.1 
to describe block diagrams which use complex wiring and functional blocks in 
combinations with stateful blocks. 


4.3 Block Diagrams to Horizontal Condition Tables 


We have explained how the categorical structure from Sect. 2.3 applies to Mealy, 
and related it to the same structure in Set. This framework allows us to 
combine update functions of individual blocks into update functions of entire 
block diagrams using the above definitions. For example, the update function 
[Mswp; delay] ua of the machine from Fig. 6b is equal to 


([Mswprlua ® idr); (idr ® Bry p); ([delay]ua 2 idy); (idr ® Brea), 


as shown in Fig. 6c, where the “1” wire is drawn in grey to illustrate how it 
achieves the data flow described by Fig.5d (normally, this wire is not drawn). 
This can be simplified, e.g., the final sequential sub-function idr ® Brpg,1 is given 
by {(#,(y,Q)) > (x, (0), y))} which simplifies to {(x,y) > (a,y)} by flatten- 
ing tuples. Our presentation of monoidal categories skips the formalities which 
describe this simplification, but it can be intuitively understood by considering 
the data flow described in Fig. 6c if the grey wire were absent (as usual). Taking 
[delay]ua = shift (as defined in Sect.2.1) which we now describe as Brg rp and 
using [Mswr]ua = swe along with appropriate axioms over the wiring mor- 
phisms, [Mswp; delay]ua simplifies to (swr ® idr); Brrr. This simplification 
can be intuitively understood by considering only the black data flow in Fig. 6c. 
In the same way that we describe the functional data flow of Fig. 3, this app- 
roach can be repeated to describe the entire block diagram in Fig. la, not just 
the combination of blocks labelled Mode and counter. 

This example illustrates how our categorical algebra for Mealy machines is 
structurally similar to the one used in [6] which describes the algorithm that rep- 
resents block diagrams in terms of sequential/parallel/feedback configurations 
of components. The algorithm from [6] constructs descriptions which contain no 
feedback operations. A similar result can be shown in our framework, allowing 
us to produce trace-free descriptions of update functions in terms of the update 
functions of their components. 
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(a) Mealy Machine (b) Update Function (c) Function Rearranged 


Fig. 7. The update function of a Mealy machine with feedback 


As mentioned in Sect. 2.3, not all feedback configurations are valid. The valid- 
ity of a feedback configuration describing a Mealy machine is decided by deter- 
mining whether or not the trace on its update function is defined. In many 
settings, the trace is defined if the aforementioned fixed-point equations have 
a unique solution [13]. However, for Simulink models that are used to generate 
embedded software, the configuration must satisfy a more strict validity con- 
dition: there must be no algebraic loops. This means there can be no cyclic 
dependencies in the underlying update function, any feedback can be trivially 
removed by rearranging the components and wiring to “yank out” the loops 
while preserving the connections between blocks. For example, Fig. 7 illustrates 
how the update function of a simple feedback configuration can be rearranged 
to remove loops. This can be formalized by the notion of vacuous guardedness 
introduced in [7]. 

This means that the update functions of well-formed block diagrams can 
be modelled without traces. In this manner, the update function of the block 
diagram in Fig. 1a can be described as 


Bre R; ([- 1] QAR ® [10] Qidpg ® [0]); (add@ Ar ®swp); (idpe @Brr.R); (swr Qgtz); Brr.p 


where each individual function has a fixed definition, and can be represented 
as a predefined tabular expression. Here gtz denotes the > 0 block labelled 
IsRunning. Functions whose behaviours are not conditional are trivially rep- 
resented by a table with a single condition: true. 

HCTs—being representations of functions—can be composed like functions. 
We modify the composition operation in [20] to describe HCTs so that we 
can compose predefined tabular expressions as stated above. When compos- 
ing two HCTs sequentially, the conditions of the first HCT appear first in the 
composed HCT and the conditions of the second HCT are included as sub- 
conditions. The conditions from the second HCT are evaluated using the out- 
put values from the first one. Consider, for example, the composition of Fig. 8a 
with Fig. 8b, where the output counter’ of the first table is routed to the input 
counter of the second (ignore the running output for now). Their composition 
is shown in Fig. 8c (ignore the running and counter’ outputs). The conditions 
counter > 0 and start (and their complements) appear in the same configura- 
tion as the first HCT. However, the sub-conditions (e.g. counter — 1 < 0) come 
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running counter’ mode! 
rl ; = counter — 1 > O|| true |counter — 1| Running 
ee as mode counter 20 counter — 1 < O|| true |counter — 1| Stopped 

counter > 0 true |counter — 1 = = - 2 
stari false T0 counter > 0|| Running atari 10>0 || false 10 Running 
counter < 0 - ——— counter < 0|| Stopped 5 z ann 10<0 false 10 Stopped 

astart|| false 0 counter <0 i 
aha 0>0 false 0 Running 
` 0<0 false 0 Stopped 

(a) ud (b) md 
(c) udt 


Fig. 8. Introducing modes 


from the conditions (counter < 0) in the second HCT, evaluated with the values 
(counter ++ counter — 1) from the row in the first HCT associated with the 
parent condition (counter > 0). The conditions 10 > 0 and 0 > 0 (and their 
complements) are generated in a similar manner, but because they are trivially 
satisfied/impossible conditions, the sub-conditions/entire row can be removed 
(the removable conditions/rows are shaded in Fig. 8c). 

Similarly to the conditions, the output expressions of the second HCT are 
evaluated with the corresponding values from the first HCT, and those are used 
as the output expressions of the combined HCT. In Fig. 8b, the output values 
for mode are constants, therefore they appear unchanged in Fig. 8c. For HCTs 
composed in parallel, the conditions from the second HCT are once again used 
as sub-conditions, but they are not modified. Similarly, the output expressions 
from both HCTs are placed in the combined table unchanged. 

The predefined HCTs representing each function in the equation above can be 
combined using the operations described above to achieve a tabular expression 
for the entire block diagram. For example, the tabular expression in Fig. 2a can 
be obtained this way. 


5 HCTs to STTs: Modes via Tables 


The HCTs produced using the technique described in Sect. 4 are an intermediate 
representation in our translation strategy. They illustrate the decision logic of 
the system as a whole, but the logic is not related to state the way it is for state 
charts, i.e., through modes. This section explains how HCTs are augmented with 
modes to form STTs, and finally state charts. 


5.1 Defining Modes 


The STTs described in Sect. 2.2 have obvious similarities to state charts, but they 
are just syntactic sugar for HCTs. STTs and state charts are modelled as Mealy 
machines with a special state variable mode with values from an enumerated 
set M (see, e.g., extended state machines in [2]). The cells in the first column 
of STTs (see Fig. 2b) express conditions of the form mode = Running which 
compare the value of mode to each element of M. The last column identifies the 
updated value of mode’. Therefore, the state spaces of Mealy machines modelling 
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STTs and state charts have the form Q = S x M, where M is the set of modes, 
and S contains tuples of the other state variable values. 

A HCT produced via the techniques in the previous section describes 
the update function ud of a Mealy machine m = (S, so, X, A, ud). We will 
enhance m with a state variable mode to produce a Mealy machine mt = 
(S x M, (so, modeg), X, A,ud*) whose update function is given by a HCT which 
matches the format of an STT. To achieve the goal of improving readability, we 
leverage the existing decision logic in HCTs. 

When a state chart updates, it only considers the transitions leaving its 
current mode, i.e., depending on its state, only some behaviours are possible. 
The same dependence on state is expressed in HCTs by conditions which depend 
only on the values of state variables, which will be referred to as state conditions. 
For example, in Fig. 8a, if the condition counter > 0 is satisfied, the system can 
only do one thing: decrement counter and set running to true. Our strategy 
associates the condition counter > 0 with a mode of operation Running € M, 
and replaces the original condition with mode = Running. We augment the 
HCTs into STTs in a way that preserves the behaviour of the Mealy machines. 

As the modes are all listed in the first column of an STT, the first augmen- 
tation reorders conditions in HCTs so that the state conditions appear first. For 
example, the conditions in Fig. 2a can be rearranged via the methods in [4] to 
obtain Fig. 8a. While our example contains only one pair of state conditions, 
HCTs describing general block diagrams may contain multiple nested state con- 
ditions. The second augmentation uses conjunction to flatten nested state con- 
ditions into a single column with a condition for each branch of the stateful 
logic. 

The augmented HCT now has a specific form (Fig. 8a) which superficially 
resembles an STT, but the behaviour is unchanged. We now introduce a set 
of modes M with each element associated with a distinct condition in the first 
column of the augmented HCT. This association is defined by a function md : 
S — M which maps tuples of state variable values to the mode whose associated 
state condition is satisfied. This function is represented by an HCT with the state 
conditions from the augmented HCT, and distinct values from M as outputs. 
The md function for the timer example is given by the HCT in Fig. 8b. 

Next, the Mealy machine is enhanced by introducing a state variable mode 
with values from M. We design the enhancement to maintain the invariant 
that the value of mode always corresponds with the state condition which 
the other state variables satisfy. The invariant is satisfied by the initial state 
(so, md(so)). The enhanced update function trivially preserves the original 
behaviour by ignoring the value of mode, but updates mode’ to maintain the 
invariant by evaluating md with the updated state variable values. The update 
function is therefore defined as ud* = (ud®! m); (ida 8 (As; (ids & md))), where 
Im: M > 1 = {(mode) +> ()} introduces an input whose value is discarded. 
Since ud and md are given as HCTs (e.g. Fig. 8a and b), the enhanced update 
function can be achieved through composition of tables (e.g. Fig. 8c). 
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This enhanced Mealy machine operates within a subset of the state space S$ x 
M where the aforementioned invariant holds. The validity of any state condition 
can now be deduced from the value of the mode variable (e.g. (counter > 0) = 
(mode = Running)). Thus, replacing those conditions with the corresponding 
modes in the HCT representation of ud™ does not modify its behaviour. This is 
the final step in rearranging the HCT from Fig. 8c into the STT in Fig. 2b. 


5.2 Converting to State Charts and Simplifying 


The state chart in Fig.9 implements the STT in Fig. 2b by creating a transition 
for each row and by creating assignment actions to update state and output 
variables. State charts produced in this manner can often be simplified by moving 
common actions from transitions to entry/exit actions of modes, or by removing 
transitions and performing the corresponding actions as during actions. For 
example, the state chart in Fig. 9 simplifies to the one in Fig. 1b. 

In the example given above, it is crucial that the new state variable mode is 
tracked in addition to the existing variable counter. The mode variable tracks the 
high level system state, but the counter variable is still important for tracking 
the detailed system state. This additional information is not always important, 
i.e., sometimes the mode is sufficient and the old state variable may be removed 
from the description of the Mealy machine. This may happen if a Boolean state 
variable generates a state condition; knowing the value of mode can be sufficient 
to deduce the value of the original state variable. It is also possible that a state 
variable from the block diagram stores more detailed information than necessary, 
and knowing the mode is sufficient for the state chart to act. In these cases, the 
unnecessary state variables can be removed from the state chart. 


6 Prototype, Evaluation, and Future Work 


The methodologies presented here have been used to develop a prototype tool 
which automatically refactors Simulink model fragments to Stateflow [18]. The 
tool supports a large subset of discrete Simulink blocks typically used for imple- 
mentation of embedded software. The refactoring tool is implemented in Matlab 
and integrates with Simulink allowing the user to select the blocks they would 
like to replace. When the tool is invoked, it generates a Stateflow chart and uses 
the Simulink Design Verifier [17] to verify that it is equivalent to the selected 
blocks. 

The prototype tool improves the readability of small to medium sized block 
diagrams such as the one in Fig. la. However, we found that the stateful logic of 
complex industrial-scale models incorporates multiple state machines interacting 
with each other and with stateless conditional logic. To elegantly represent these 
complex block diagrams in Stateflow, the translation methodologies presented 
here can be enhanced to utilize the more sophisticated mechanisms of state 
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charts such as hierarchical/parallel modes. We believe that many state chart 
mechanisms have analogies in tabular expressions, e.g., using hierarchies of state 
conditions can be leveraged to specify sub-modes. We found that block diagrams 
encoding more than 4 high-level modes can often become difficult to understand 
without these mechanisms. 


[start]/ 
{counter = 10; 


running = false;} = 
Runnin 
a 2 [counter > 1]/ 
P {counter = counter - 1; 
[counter <= 1]/ rapning:= truley} 


{counter = counter - 1; 
running = true;} 


{counter = 0;} 


[~start]/ 
{counter = 0; 
running = false;} 


Fig. 9. State chart equivalent to STT 


We also recognize the importance of finding refactorable fragments in large 
models. In fact, the translation methodology presented in this paper was devel- 
oped in parallel with an identification strategy that pinpoints block diagrams 
which are candidates for refactoring—it searches for certain patterns of logical 
and stateful blocks which indicate complex state update logic. An elaborated 
description of both translation and identification strategies will be presented in 
the master’s thesis of the first author [29]. 


7 Related Work 


Several papers propose translating Simulink block diagrams to formal languages 
to enable their verification using existing tools (e.g., [1,6,14,23,27,30]). Only 
a few, however, translate Simulink block diagrams to state transition diagrams. 
In [19], Simulink block diagrams are converted into an extended version of hybrid 
automata, with each block in a block diagram converted to a hybrid automa- 
ton, leading to an explosion in the number of states of the resulting model. 
In [31], Simulink models are converted to finite state machines, but transitions 
between states represent the small execution steps of individual blocks updates, 
not changes in the high level system modes. Both studies [19,31], as well as [16], 
do not aim to capture the high-level state machine of an entire block diagram. 
This is exactly what our approach does, with maintainability of the resulting 
model as a prime motivator. 

Our approach to modelling Mealy machines and their interactions using the 
monoidal category Mealy follows a general trend in behavioural modelling. For 
example, monoidal categories have been used to describe interactions of quantum 
processes [5], labelled transition systems [12], and control systems [3]. The alge- 
bra of (traced symmetric) monoidal categories is similar to the algebra used to 


SL2SF: Refactoring Simulink to Stateflow 279 


describe block diagrams in [6], but our approach uses a standard mathematical 
framework with a rich history and many known results. For example, the results 
of [9] indicate that by considering equivalence up to bisimilarity, the category 
Mealy is symmetric monoidal, meaning the appropriate axioms and resulting 
properties of this structure are already known. 


8 Conclusion 


In this paper, we proposed a method for translating Simulink block diagrams 
to Stateflow state charts via tabular expressions representing their respective 
Mealy machines update functions. A categorical framework for composing Mealy 
machines provides a theoretical basis for the translation. To the best of our 
knowledge, this is the first method for Simulink to Stateflow translation. Our 
proposed method is relevant to industrial development where it can help improve 
software maintainability and aid compliance with modelling guidelines. 
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Abstract. Various kinds of typed attributed graphs can be used to rep- 
resent states of systems from a broad range of domains. For dynamic sys- 
tems, established formalisms such as graph transformation can provide 
a formal model for defining state sequences. We consider the case where 
time may elapse between state changes and introduce a logic, called Met- 
ric Temporal Graph Logic (MTGL), to reason about such timed graph 
sequences. With this logic, we express properties on the structure and 
attributes of states as well as on the occurrence of states over time that 
are related by their inner structure, which no formal logic over graphs 
concisely accomplishes so far. 

Firstly, based on timed graph sequences as models for system evolu- 
tion, we define MTGL by integrating the temporal operator until with 
time bounds into the well-established logic of (nested) graph conditions. 
Secondly, we outline how a finite timed graph sequence can be repre- 
sented as a single graph containing all changes over time (called graph 
with history), how the satisfaction of MTGL conditions can be defined for 
such a graph and show that both representations satisfy the same MTGL 
conditions. Thirdly, we present how MTGL conditions can be reduced 
to (nested) graph conditions and show using this reduction that both 
underlying logics are equally expressive. Finally, we present an extension 
of the tool AUTOGRAPH allowing to check the satisfaction of MTGL 
conditions for timed graph sequences, by checking the satisfaction of the 
(nested) graph conditions, obtained using the proposed reduction, for 
the graph with history corresponding to the timed graph sequence. 


Keywords: Nested graph conditions - Metric temporal logic - 
Sequence properties - Typed attributed graphs - Symbolic graphs 


1 Introduction 


Various kinds of typed attributed graphs are used to represent states of systems 
from a broad range of domains. Also, the evolution of such systems can be 
described using a multitude of graph transformation formalisms in which the 
possible behavior in form of graph sequences is defined by a set of rules and their 
application. In many cases, the analysis of this induced behavior with respect 
to a specification in form of a temporal logic that defines the admissible graph 
sequences is of paramount importance. 
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In our running example, from which we derive the lack of suitable specifi- 
cation formalisms, we consider a dynamic system describing an operating sys- 
tem, which generates timed sequences of (typed attributed) graphs to model the 
change of the operating system states over time. In this example, users may cre- 
ate tasks with identifiers id, the operating system may create handlers specific 
to task identifiers to allow for the task execution, and the handlers may produce 
a result when a task has been executed (marking the successful handling of the 
task). To model the states of the operating system, we employ graphs that store 
the tasks, the handlers, and the computed results. In the remainder, we refer in 
the context of this example to the sequence property P to be checked w.r.t. the 
timed graph sequence at hand describing systems’ state changes over time. 


P: Whenever a task T with identifier id is created on a system S, a handler H 
for this task (i.e., with a task identifier t_ id equal to id of T) must exist. 
Moreover, within 120 timeunits, the handler must produce a result R with 
value success and, during the computation of the result, no other handler 
H’ for the same task (i.e., with the same task identifier t_ id) may exist. 


We consider the problem that existing specification formalisms for graph- 
based systems cannot cover properties such as P. The available (metric) tempo- 
ral logics, such as Metric Temporal Logic (MTL) [16], are defined over Kripke 
structures abstracting from the system states by labeling each state with a sub- 
set of the finite set of atomic propositions. The commonly used operator until 
allows then to formalize the part of property P stating that every graph that 
contains a task T is followed by some graph containing some result R before t 
time units. However, the existing metric temporal logics do not support the use 
of bindings of elements contained in the graphs to express how a certain matched 
pattern evolves in a sequence of graphs. Therefore, they are insufficient when e.g. 
creating different tasks T and T’ must be followed by creating the corresponding 
results R and R’ while also treating the deadlines for their existence separately. 

As a first contribution, we define Metric Temporal Graph Logic (MTGL) 
for the concise specification of systems that generate timed graph sequences. In 
MTGL, we express properties on states using the well-known formalism of nested 
graph conditions [12,24] (called GCs for short). The satisfaction of a GC that 
states the existence of a graph pattern H in the given graph G results in a match 
m from H to G. We extend the logic of GCs to MTGL by extending GCs with 
the metric temporal operator until that may appear in the scope of a previously 
determined match m. Using this extension, we can express properties, such as 
property P, on the structure and attributes of states as well as on the occurrence 
of states over time where the preservation/extension of matches during a systems’ 
evolution increases the expressiveness beyond the existing formal logics. 

As a second contribution, we outline how a finite timed graph sequence can 
be represented as a single graph containing all changes over time (called graph 
with history), how the satisfaction of MTGL conditions can be defined for such 
a graph, and show that both representations satisfy the same MTGL conditions. 

As a third contribution, we show that MTGL conditions can be reduced to 
GCs using attribute constraints to encode the metric temporal requirements, 


284 H. Giese et al. 


while preserving the satisfaction for finite timed graph sequences. This encoding 
enables the direct application of techniques for GCs such as [25]. 

As a fourth contribution, we present an extension of the tool AUTOGRAPH [25] 
allowing to check the satisfaction of MTGL conditions for timed graph sequences 
by checking the satisfaction of the GCs obtained using the proposed reduction for 
the graph with history corresponding to the timed graph sequence at hand. 

The paper is structured as follows. Section 2 discusses related work. Section 3 
iterates on technical preliminaries. Section4 defines timed graph sequences, 
MTGL, and the satisfaction of MTGL conditions for timed graph sequences. 
In Sect.5, we show how to represent a finite timed graph sequence as a single 
graph with history, define satisfaction of MTGL conditions for a graph with 
history, and prove that both representations satisfy the same MTGL conditions. 
In Sect. 6, we introduce a reduction of MTGL conditions to GCs and show the 
equivalence of these two logics. Finally, Sect.7 discusses the tool support and 
Sect. 8 concludes the paper with a summary and remarks on future work. 


2 Related Work 


There are several related formal and informal approaches for the specification 
and verification of different kinds of sequence properties. 

In [13] the satisfaction of CTL (state/sequence) properties is checked where 
the tool GROOVE [10,26] is used to generate the finite state space of the graph 
transformation system (GTS) at hand. In [7] invariants are checked for a GTS 
with a possibly infinite state space. The validity of given pre/post conditions for 
a program over a GTS has been presented in [23]. In [2,15] temporal properties 
for GTS with infinite state space are checked using the tool AUGUR2. 

In [19] the satisfaction of graph-based probabilistic timed CTL properties is 
checked where the tool HENSHIN [1,8] is used to generate the finite state space of 
a GTS and where the tool PRISM [17] is used to model check translations of the 
given properties. In [6] a sequence of timed events are checked against sequence 
properties given by regular languages based on deterministic finite automata. 

The use of bindings, as in this paper, is supported in [3] where bindings are 
part of the Metric First-Order Temporal Logic in which system states are repre- 
sented by a set of relations that are adapted during the execution of the system. 

A visual but informal notation for the specification of sequence properties 
involving time and graph bindings was introduced in [14]. 

In conclusion, existing approaches with a formal semantics do not support 
either time, bindings, or graphs in a concise manner. Thereby, our graph-based 
logic MTGL for graph-based systems complements existing approaches since 
(a) it eases usability in graph-based contexts similarly to the usage of GCs that 
are favored over first-order logic in these contexts, (b) it enables further develop- 
ments and combinations with other graph-based techniques such as those in [25], 
and, (c) as to be shown by future tool-based evaluations, it can be expected that 
domain-specific tools for checking MTGL conditions are more efficient compared 
to general-purpose tools such as shown analogously for GCs in [23]. 
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[ask System Result 
ae On for by to 
TG: id : string t_id : string value : string 


Fig. 1. The type graph TG for our running example where the attributes cts and dts of 
sort real used in later sections are omitted in every node and edge to improve readability 


3 Typed Attributed Graphs and Graph Conditions 


We now recall typed attributed graphs and nested graph conditions used for 
representing system states and properties on these states, respectively. 

We use symbolic graphs |21] to encode (finite) typed attributed graphs. Sym- 
bolic graphs are an adaptation of E-GRAPHS [9] where a graph does not contain 
data nodes (i.e., elements that represent actual values) but instead node and edge 
attributes are connected to variables, which replace the data nodes. Symbolic 
graphs are also equipped with attribute constraints over these (sorted) variables 
(e.g. x = 5, x <5, and y=“aabb’). 

We consider symbolic graphs that are typed over a type graph TG using a 
typing morphism type : G — TG. Type graphs restrict attributed graphs to an 
admitted subset. For our running example, we employ the type graph TG from 
Fig. 1. An example of a symbolic graph that is typed over TG is given in Fig. 4. 

We state the existence and nonexistence of graph patterns in a given symbolic 
graph, which is called a host graph, by representing graph patterns by symbolic 
graphs and by using monomorphisms (called monos and denoted using —> sub- 
sequently) to extend graph patterns. Formally, we rely on the notion of nested 
graph conditions (GCs) [12], which are expressively equivalent to first-order logic 
on graphs [5] as shown in [12,24]. 

Definition 1 (Graph Conditions (GCs)). The class of graph conditions 
(GCs) BGC for the graph H contains y% if one of the following cases applies. 


- y = ^S and S = {1,...,¢n} C PSE. 
- p = nọ and ¢ € PGS. 
- y = 3(a, $), a: H — H', and de PSS. 


GCs allow for further abbreviations such as true, false, VS, and Y(a, ¢). 


Intuitively, a GC is satisfied if the positive but not the negative patterns given by 
the GC can be found in the given host graph. For the case of the exists operator, 
a previously determined match m must be extendable using a mono q according 
to the mono a from the GC. 


Definition 2 (Satisfaction of GCs). A GC y e GC is satisfied by a 
mono m : H — G, written m = Y, if one of the following cases applies. 
- p = NS and m ¥ @ for each ọ € S. Hs gp 


- p = no and not m = ¢. Vy. 
—~w=A(a: H — H’,¢) and there exists q : H' —> G such ™ q 

that qoa=m and q FE ¢ (as depicted on the right). G 
A GC w over the empty graph is satisfied by a graph G, written G = 4, ifia Ew 
where ig : Ø — G is the initial morphism to G. 
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4 Metric Temporal Graph Logic 


We build upon GCs [12] and the future fragment of MTL [16,22] to introduce 
Metric Temporal Graph Logic (MTGL) by defining its syntax and semantics. 

We assume a graph transformation based formalism for the definition of steps 
changing a graph while possibly also determining a progress of time. We abstract 
from the actual timed graph transformation formalism employed but only assume 
that it is capable to generate so-called timed graph sequences (short TGSs), 
which contain the graphs, their modifications, and the elapsed time between 
successive graphs. In the following, we are concerned with TGSs in which either 
only the past states of sequences are given in the form of finite TGSs or where, 
alternatively, an infinite TGS describes a nonterminating evolution of a system. 

A step from a graph G to a graph G’ where G has remained unchanged 
for a duration of 6, which may be determined by a timed graph transformation 
formalism, is represented by G- (8, l, r) - G” in our notion of TGSs. In this repre- 
sentation, the monos l : IG — G and r : IG — G’ identify the graph elements 
that are preserved from G to G’, i.e., G — (IG) are the nodes and edges that 
are present in G but are deleted to obtain G” and G” — r(IG) are the nodes and 
edges that do not exist in G but are created to obtain G’.! 


Definition 3 (Timed Graph Sequences (TGSs)). We inductively define the 
class of finite timed graph sequences (TGSs) IT fin as follows: 


— If n = Ginu is the sequence containing only the graph Gini, then T € fin. 

- Ifa € fn is a TGS ending with a graph G, 1: IG —> G, r : IG => G are 
monos (for an interface graph IG), and 6 € Rg is the timepoint where the 
graph G is changed relative to the previous change, then m-(6,1,r)-G" € Han- 


The class of TGSs I contains the finite TGSs Ifin from above and all infinite 
sequences that have only finite TGSs from IIfn as prefixes. 

Moreover, dur(m) denotes the sum of all durations ô contained in n. Addi- 
tionally, if dur(ma) = co, m, denotes the unique graph at time t, i.e., ifm = G 
then m; = G and if n =G-(6,l,r)- n’ then (m, = G fort < ô) and (Ti = T,_5 
for t > ô). Finally, if dur(T) = 00, Tọ, ta} denotes the finite TGS contained in 
m between and including Ti and Ti. 


We do not require that every step modifies the current graph (i.e., we permit 
G = G’ possibly using l = r = idg). Also, time may not elapse in a step (i.e., 
we permit ô = 0) but for well-definedness of the satisfaction relation for TGSs 
we require that time diverges in every infinite TGS ~ (i.e., dur(7) = oo). 

In our running example, we simplify the presentation by using only inclusions 
l and r. The TGS ~r given in Fig.2 contains five graphs G; for i € {0,1, 2,3, 4} 
showing the system states in five different points in time, namely 0, 5, 10, 13, 
and 15. The corresponding durations where the respective graphs G; remain 
unchanged are denoted by 6; for i € {0,1,2,3}. 


1 The span G b IG & G' does not correspond to a rule as used in the DPO approach 
but rather to a rule application describing changes between the graphs G and G”. 
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Go Gi G2 G3 Ga 
e as 
id = 123 id = 123 id = 123 
senl Z| axen 
S:System S:System |<— S:System |< 
Eat zs Z| 
t_id = 123 t_id = 123 t_id = 123 
ety} e; :to e4 :to 
R:Result R:Result 
value = success value = success 


Fig. 2. A TGS a for our running example. For i € {0,1,2,3}, the arrows EN between 
graphs of the TGS describe changes Gi; - (ôi, li, ri) Gipı where the inclusions l; and r; 
are implicitly given by the usage of the same names in all graphs. 


T:Task S:System S:System 
ee E Or gJ e2:for 
S:System 


w [iii = u |? 


R:Result 
e eby , true 


=E 


e3:for true] Up 120 3 
al ET | baa 


Fig. 3. The property P from our running example formalized by the MTGC w 


The syntax of MTGL is given by Metric Temporal Graph Conditions (short 
MTGCs) introduced in the following definition. The distinguishing feature of 
MTGL is the extension of the binding of graph elements used by the operator 
exists in GCs to the until operator of MTL. This allows for the formalization 
of properties where a match into a graph is preserved/extended over multiple 
timepoints in the subsequently introduced semantics for TGSs. 


Definition 4 (Metric Temporal Graph Conditions (MTGCs)). The class 
of metric temporal graph conditions (MTGCs) TSS for the graph H contains 
w if one of the following cases applies. 


- p = A^S and S = {d1,...,¢n} C MTS, 

- p=n¢ and p € PMTGC, 

- y = 3(a, $), a: H — H', and de PMTSC, 

- Y = ġı Ur $2, I is an interval over Ri, and {61,2} C PMTSOC, 


Further metric temporal operators can be defined as for MTL and GCs. 

For our running example, we formalize the property P from Sect. 1 by the 
MTGC w depicted in Fig. 3. In this MTGC, we additionally use the forall-new 
operator in the form of YN (a : H — H’,¢) to match the pattern H’ into the 
considered TGS as soon as possible, i.e., precisely at the minimal timepoint, at 
which all elements of H’ exist. This operator can be encoded by the equivalent 
MTGC ~((~3(a,=¢)) Ujo,o) S(a, -¢)), which intuitively states that “there is no 
violation ever that did not exist before”. Moreover, we use notational conventions 
to simplify our presentation of MTGCs by omitting elements in subconditions. 
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Firstly, we omit nodes (such as T) if no new edges or attributes are attached to 
them. Secondly, we omit edges (such as e;) if no new attributes are attached to 
them. Finally, we omit attributes (such as id of T) in general. 

The MTGC w properly formalizes the property P using the binding capa- 
bilities of MTGL as follows: the nodes T, S, and H (together with the edges 
€1, €2 as well as their attributes) are shared among the two subconditions of the 
until operator. This implies that the Handler node that must be matched by the 
right subcondition of the until operator is the previously bound Handler node 
H. Similarly, the System node that may be matched by the left subcondition of 
the until operator is the previously bound System node S. 

Next we present the MTGL semantics for TGSs that defines when a given 
TGS satisfies a given MTGC. For the definition of this semantics, we first intro- 
duce the concept of a match that is preserved over a finite number of steps given 
by a finite TGS. In the following, we also call such a preserved match a binding. 
The preservation of the match is guaranteed by adapting it according to the 
renaming determined by the steps of the TGS for the case where these steps do 
not remove any element initially matched. 


Definition 5 (Preserved Match for a Finite TGS). A mono m : H — Go 
is preserved over a finite TGS v that starts in Go and ends in Gn resulting in a 
mono m' : H > Gp, written mem’, if one of the eee cases applies- 


- T = Go = Gn and m = m'. Go ~ > GG 
D IG => Go,r : IG — G,)- 7 and j 
there is m” : H — IG such PENRE and Te 
rom! Tem. 
The fact that the step does not remove elements that are matched by a mono 
m is obtained from the existence of a mono m” making the triangle m = l o m” 
commute. The required renaming is then performed by replacing the match m 


by ro m”. The mono m” is uniquely defined when it exists. 
Based on the preservation of matches, we now define the semantics for TGSs. 


Definition 6 (Satisfaction of MTGCs by TGSs). A given MTGC w € 
PUTOS is satisfied by a TGS 7, an observation timepoint t € Rj, and a mono 
m: H — m,, written (n,t,m) Eras Y, if one of the following cases applies. 


- p = ^S and (n,t, m) Eras ¢ for each ọ € S. 
- yp =7¢ and not (n, t, m) Eras ¢. 
2 gJ 


= 3(a : H — H’,¢) and there is some q : H' — m; such that qoa =m and 
t,q) Hras ¢- 
- w= ġı Ur de and there is some t' € I such that 
e there ism’: H > Tipy s.t. matt, m! and (m, t+t, m) Eras b2 and 
e for every t” € [0,t’) it holds that there is an m” : H > miye such that 
mates m” and (x,t + t”, m”) Eras ¢1. 


An MTGC y over the empty graph is satisfied by a TGS n, written t Eras wu, 
if (T, 0,in,) Eras Y where iz, : O —> To is the initial morphism to the graph at 
timepoint 0 of m (t.e., the first graph of n). 
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This semantics is similar to the semantics of GCs for conjunction, negation, 
and the exists operator since for the triple (7,t,m) it always holds that the 
codomain of m is the graph m, and since the checked MTGC is defined for the 
domain of m. The TGS ~ and the current timepoint t are used in the case for 
the until operator where we rely on the preserved match relation from above to 
change the codomain of a match from 7; to the graphs mpy and mper at later 
timepoints. 


Example 1 (TGS satisfies MTGC). Considering our running example, we argue 
that the MTGC given in Fig. 3 is satisfied by the TGS given in Fig. 2. Firstly, 
the forall-new operator matches the nodes T, S and the edge e1 in Go at time- 
point 10, which is the maximal creation timepoint of these three elements. Then, 
the exists operator matches the node H together with the edge e2 in G2 at the 
same timepoint. Finally, the until operator matches subsequently the node R 
and the edge e3 in G3 at the timepoint 13 and the remainder true is trivially 
satisfied for the timepoint 13. In addition, as also required by the until operator, 
for every timepoint in the interval [10,13), it is not possible to match a second 
Handler node H’ that is connected to S. This holds because the graph in m for 
the timepoints in this interval is the graph G2, which indeed does not contain 
such a second Handler node. 


5 Mapping of TGSs to Graphs with History 


Subsequently, we are concerned with finite TGSs a (which have a finite number 
of steps and therefore also satisfy dur(m) < co) for which the satisfaction of 
an MTGC y is decidable [4] when replacing in w right-open intervals [r,oo) and 
(r, 00) by [r, dur(z)) and (r, dur(z)), respectively. Such an adaptation of intervals 
leads to an MTGC y that is bounded and for which the satisfaction by the finite 
TGS 7 is equivalent (i.e., 7 Eras Yy 4> rt Eras y’). 

To analyze the satisfaction of an MTGC by a given finite TGS, we now 
introduce the notion of graphs with history (in short, GHs) as an equivalent 
representation of a given finite TGS. Afterwards, we introduce a semantics oper- 
ating on this alternative representation (called in the following semantics for 
GHs) that is compatible with the semantics introduced before for TGSs. The 
translation from finite TGSs to GHs reduces the size of the representation in 
terms of the stored data. Moreover, it decouples the observation of modifications, 
resulting in a GH, and the subsequent satisfaction check for possibly several 
MTGCs. 

The notion of GHs for capturing the changes to a current graph over time as 
given by a TGS 7, requires that the used type graph TG contains for all nodes 
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and edges the attributes cts and dts of sort real to capture the total timepoint 
at which an element was created and (if applicable) deleted, respectively.” 


Definition 7 (Graphs with History (GHs)). Let TG be a type graph where 
all nodes and edges have attributes cts denoting the timepoint of their creation 
and dts denoting the timepoint of their deletion. Then Gy is a graph with history 
(GH) if it is typed over TG satisfying the following consistency requirements.’ 


— There is precisely one cts attribute for every graph node and edge. 

— There is at most one dts attribute for every graph node and edge. 

— For an edge e, the value of the cts attributes of the source and the target nodes 
of e are less or equal to the cts attribute of e. 

— For an edge e, the value of the dts attributes of the source and the target 
nodes of e are greater or equal to the dts attribute of e. 


We now define the operation Fold, which converts a finite TGS ~ (i.e., a 
TGS with a finite number of steps) into the corresponding GH Gy. This recursive 
operation handles the renaming given by the monos l and r in the steps of 7 and, 
moreover, encodes the insertion of additional nodes/edges a by adding attributes 
cts = t for these nodes/edges in the constructed Gy and by equipping removed 
nodes/edges a with an additional attribute dts = t where t is the current total 
time of the considered TGS a in both cases. 


Definition 8 (Map TGS to GH (Operation Fold)). 


- If n = Gini, then Gy = Fold(z) is obtained from Gini by adding the 
attributes cts(a@) = 0 to each node or edge a in Ginit. 

- Ifn = vn' - (ô,l : IG — G,r : IG — G') -G is a TGS, Gi, = Fold(z’) is the 
GH obtained from the mapping of the TGS n’ using the operation Fold, and 
t = dur(z’) is the total time of G'y, then Gy = Fold(r) is constructed from 
G'y by adding the attributes dts(a) = t+ô to each node or edge a € G—I(IG), 
by renaming each node and edge a € I(IG) according to l, by adding each 
node and edge a € G’ — r(IG), by renaming each node and edge a € r(IG)) 
according to r, and by adding the attributes cts(a) = t + ô to each node or 
edge a € G@' — r(IG). 


The following example covers an application of Fold to a finite TGS. 


Example 2 (Map TGS to GH). We map the finite TGS 7 from Fig. 2 to the 
GH Gy shown in Fig.4 using the operation Fold as follows. Since m starts 
with an empty graph Go, we first map it into the empty GH. The second state 
of m given by Gi including the System node S is added to the TGS after 5 
timeunits. We map this TGS state to the GH by adding S to the empty GH 


? The total timepoints of additions and removals of attributes and their values can 
be encoded by moving attributes into separate nodes, for which their cts and dts 
attributes then encode the relevant timepoints. 

3 Note that the consistency requirements used in this definition are not guaranteed by 
the formalisms of E-GRAPHS or symbolic graphs. 
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R:Result 
cts = 13 
value = success 


S:System 
cts = 5 


t 


Fig. 4. Mapping of the TGS r from Fig. 2 to the GH Gy = Fold (r) 


e4:to 
cts = 13 


Gu: 


and by, additionally, equipping this node with the creation timepoint cts = 5. 
After another 5 timeunits, an additional Task node T, a Handler node H, and 
edges e1, e2 between the existing System node S and the new Task node T resp. 
the new Handler node H are added to the TGS resulting in the TGS state Go. 
These changes are again mapped to the GH by adding the Task node T, the 
Handler node H, and the edges e1, e2 to the current version of Gy as well as by 
additionally equipping them with the creation timepoints cts = 10. In a similar 
manner the Result node R together with the edges e3 and e4 (see the TGS state 
G3) are added to the GH with the creation timepoints cts = 13. Finally, after 2 
timeunits, the edge e3 is deleted to obtain the TGS state G4. To reflect this in 
the GH, we add to the edge e3 in Gy the additional deletion timepoint dts = 15. 


For the satisfaction of an MTGC of the form (a : H — H’',¢), where the 
exists operator is inherited from GCs, it is still required that the pattern that is 
found so far (given by some mono m : G <> Gy) in the host graph Gy can be 
extended to a larger pattern (given by some mono m’ : G’ —> Gg). Additionally, 
we have to check that all matched elements are already created (because the GH 
also contains the elements created with higher cts values) but not yet deleted 
(because the GH also contains the elements deleted at earlier timepoints). For 
the satisfaction of an MTGC of the form ¢; Uzr ¢2, where the until operator 
is inherited from MTL, it is still required that ¢2 must be satisfied at some 
timepoint t’ in the interval J relative to the current observation timepoint t and 
that ¢1 is continuously satisfied (by a possibly varying match for each timepoint) 
for all timepoints preceding t’. 


Definition 9 (Satisfaction of MTGCs by GHs). An MTGC y e PMTGC js 
satisfied by a mono m : H @ Gy and an observation timepoint t € R}, written 
(m,t) Fou Y, if max({0} Ucts(m(H))) < t < min({oo} U dts(m(H))) and one 
of the following cases applies. 


- ww =A{d1,.-.,bn} and (m,t) Fan ¢ (for all 1 <i< n). 

- p =7¢ and not (m,t) Eau ¢. 

- y = J(a : H — H’',¢) and there is some q : H' — Gy such that qoa=m 
and (q,t) Faun ¢. 

- Ww = hı Ur do and there is some t € I such that (m,t+t') Eau ¢2 and for 
every t” € [0,t’) it holds that (m,t+t”) Eau ¢1. 


An MTGC w over the empty graph is satisfied by a GH Gy, written Gy Eau Y, 
if (ig, 9) Eau Y where ig, : 0 Gq is the initial morphism to Gy. 


Note that the reasoning for the satisfaction of the MTGC w from Fig.3 by 
Guy = Fold(z) from Fig. 4 proceeds analogously to Example 1. 
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In the following theorem (see [11] for its proof), we state the compatibility of 
the two satisfaction relations for the case of finite TGSs showing that they can 
be used interchangeably to determine the satisfaction of an MTGC in this case. 


Theorem 1 (Soundness of Operation Fold). If 7 € Hin and € py Tee 
then 7 FTES Y iff Fold(z) F-GH WY. 


6 Reduction of MTGL to GCs 


We now introduce a procedure for checking the satisfaction of an MTGC by a 
GH using a reduction of an MTGC to a corresponding GC. Based on the Fold 
operation from the previous section, we thereby obtain a checking procedure for 
finite TGSs as well. Moreover, this reduction shows that MTGL is as expressive 
as the logic of GCs on finite TGSs (since every GC is trivially also an MTGC). 

We first present the operation Reduce for translating an MTGC into the 
corresponding GC and then show that this translation (also called reduction in 
the following) is compatible with our semantics for GHs and the operation Fold 
from before. The operation Reduce encodes in the resulting GC all parts of the 
satisfaction relation Fay that are not covered by the satisfaction relation — for 
GCs. In particular, the operation Reduce removes all occurrences of the until 
operator and encodes the check that the elements that are matched by the exists 
operator have all been created as well as that none of them has yet been deleted. 

Technically, we translate a GH Gy = Fold(z) for a finite TGS a, Y € 
ce es and an observation timepoint t € Rj (where Gy and w are typed over 
a type graph TG) into a graph G‘, and wy’ € OFC (where both are typed over 
a changed type graph TG’) using the procedure presented in Definition 10. We 
obtain Y’ from w by encoding the until operator suitably and by implementing 
the checks of cts and dts attributes according to Definition 9 for the exists and 
until operators using attribute constraints, for which we add variables to Y. We 
also add the same variables to Gy to obtain G'g- 


Definition 10 (Reduce MTGC to GC (Operation Reduce )). The recur- 
sive operation Reduce takes 3 arguments: a GH Gy that has been obtained by 
application of the operation Fold to a TGS m, an observation timepoint t € RẸ, 
and an MTGC w € open, Gy and all graphs contained in w are typed over 
the type graph TG. 

The operation Reduce returns a pair (G'g, Y') consisting of a graph G'y 
(which is a slight modification of Gy) anda GC w' € BEC, The graph Gi, and 
all graphs contained in Y’ are typed over an adapted type graph TG" (called a 
reduction type graph) introduced below. 


1. (Construction of the reduction type graph TG"): 
We adapt the original type graph TG to TG’ by adding an Encoding node 
with attributes num : int and var : real. 

2. (Construction of the MTGC way with cts and dts attributes): 
We obtain Wate from w by adding the attributes cts = tq and dts = aq to 
all nodes and edges a contained in graphs in w. 
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wv ‘ vo ‘Encoding 
J| |num = 0 Oo 
var = to 


S:System 


v1:Encoding 


cts = tef 
dts = 2a,s 


num = 1 QO, 


, 24 | em . true 
var = 21 
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S:System vg:Encoding 
e9:for . 
=| e GS = ee BURL =g ,73 | (ee true | A 
2 a var = T2 
dts = tae, 
v3:Encoding 
J| |num = 3 Cz: 
var = T3 
[Handler | R:Result v,:Encoding 
e3:by 
al e cts = Tee cts = Tak num: = 4 al O4 true | Atrue 
- a 8 |dts = za4,R var = 2, 
S = Taes | value = success 
vs:Encoding 
AV| |num = 5 Os 
var = T5 
ve:Encoding 
e4 :for 7 mane | 
Sa cts = Te,ez i = ie oe , 7d] Oe true | Atrue 
= @a,H! =" 
dts = Td,e4 t_id = aq 
eh = {xo = 10} O4 = O3U {x4 = 23} 


6; =GoU' {2 = zo} 
Ow = QU {~alive(z1, {T, S,e1})} 
o =O1U {x2 = 21} 


Oæ = O2U {n alive(z2, {T, S, e1, H, e2})} 


Oa» = O4U {~alive(z4, {T, S, e1, H, e2, R, e3})} 
Os = O3U {x2 < T5, t5 < x3} 

6s =G;U {re = a5} 

Oss = O6U {aalive(xe, {T, S, e1, H, e2, H’,e4})} 


O3 = O2U {x2 +0 < z3, £3 < x2 + 120} 


Gh: == aaa 
= e1:0n =a 
p = eee) fe = ó i 
i -ias dts = —1 [ots = — 
id = 


a | 


R:Result 

= e4:to 
e cts = 13 
ese dts = -1 
value = success 


vo:Encoding vı :Encoding | | vg:Encoding 


v3:Encoding | | v;:Encoding | | vs:Encoding ve:Encoding 


num = 2 
var = T2 


num = 1 
var = fı 


num = 0 
var = To 


num = 6 
var = Te 


num = § 
var = ts 


num = 4 
var = 2% 


num = 3 
var = 23 


O = {xo = 10, x) = x0, %2 = 11,22 +0 < 13,03 < ra +120, xs = v4, 22 < 25,05 < tate = 25} 


Fig. 5. The GC wy” and the adapted graph G'y resulting from applying the operation 
Reduce to the GH from Fig. 4, the timepoint t = 10, and the MTGC y from Fig. 3 
(where the outermost forall-new operator has been simplified to the forall operator) 


3. (Construction of the GC y"): 
Ww = 


Encoding node vo with the attributes num 


(iggy, Reducerec(Watt; £0, Go, 9)) where Go is the graph containing the 
0, var = Zo as well as the 


attribute constraint xo = t and ig, : 9 Go is the initial morphism to Go. 
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Then, Reduceyec(Watt, Lo, Ga, G) = Yht if one of the following cases applies 
(where patt is the condition to be reduced, x, is the timepoint at which the 
subcondition must be satisfied, Ga is the graph containing additional nodes, 
edges, and attribute constraints to be added to the graphs in conditions con- 
structed, and G is the graph over which the condition ar is defined). 

(a) ate = AS and wy, = A{Reduceyec($, zo, Ga, G) | QES} 

(0) Wate = 7G and Yli = 7 Reducerec($, Lo, Ga, G). 

(c) wate = Ila : Hı © Ho, ¢) and vy, = A(a’ : Hi => Hb,75(m : H} —> 
H}, true) A Reduceyec(¢, Ln, G, H5)) where G!, equals the graph Ga, to 
which an Encoding node vn with the attributes num = n, var = Zp 
(where no Encoding node has been created in the reduction for n so far) 
and the attribute constraint £n = £o have been added, Hi = Ga U Hi, 
H} = GLU H2, H} equals the graph H}, to which the attribute constraints 
—alive(£n, H2) have been added,‘ a’ is obtained as the union of a and the 
identity morphism idg,, and m is an inclusion. 

(d) Watt = 1 Ur b2 and W ati = A(mo : Go > G1, Reducerec($2, Eni; Gh G1) 
A YV(mı : Gi > Go, Reduceyec($1, zna, GH, G2))) where G, equals the 
graph Ga, to which an Encoding node vn, with the attributes num = nz, 
var = Tn, (where no Encoding node has been created in the reduction for 
nı so far) and the attribute constraints equivalent to £n, € I have been 
added, Go = G U Ga, Gi = GU G, Mo is an inclusion, G” equals the 
graph G’, to which an Encoding node vn, with the attributes num = ne, 
var = Tn, (where no Encoding node has been created in the reduction for 
n2 so far) and the attribute constraints equivalent to £n, € [£o, £o + £n, ) 
have been added, G2 = Gi UG", and mı is an inclusion. 

4. (Construction of the graph Gy): 

We obtain G'y by adding elements to Gx as follows: 

(a) We add the attribute dts = —1 to all nodes/edges without that attribute. 

(b) We insert all Encoding nodes contained in graphs in Y” together with their 
num = n and var = £n attributes. 

(c) We add the attribute constraints added during the reduction except for the 
alive constraints. 


We now demonstrate how the operation Reduce can be applied to the MTGC 
from our running example. 


Example 8 (Reduce MTGC to GC). We now apply the Reduce operation to GH 
from Fig. 4, the timepoint t = 10, and the MTGC y from Fig. 3 resulting in G'g 
and yw’ given in Fig.5. However, to simplify the presentation, we replaced the 
enclosing forall-new operator by the forall operator to avoid the substitution of 
the forall-new operator by its encoding from Sect. 4. 


1. We add the attribute dts = 2¢,q to all nodes/edges a of Gy without dts 
attribute and add the attribute constraint xg,. = —1 to the set of constraints. 


4 For a graph H, alive(x, H) equals alive(x, S) for the disjoint union S of the nodes and 
edges of H. For a set S of nodes and edges, alive(x, S) equals U{alive(x, a) | a € S}. 
For a node or an edge a, alive(x, a) equals {£e a < £, Vda = -—1V T< Lac}. 
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With these additional attributes and the cts = £e a attributes introduced by 
the operation Fold, we are able to state the existence of nodes/edges at a 
given timepoint x, using attribute constraints in the resulting GC wv’. 

. We add a unique Encoding node to each graph in y’ as a container for addi- 
tional variables z» that are used in attribute constraints to encode the current 
observation timepoint (the num attributes are included to decrease the num- 
ber of matches to be considered). Initially, we add an enclosing exists operator 
with the attribute constraint £o = t (see Og) where t is the input observa- 
tion timepoint that is 10 for this application of Reduce. Further attribute 
constraints then relate the additional variables x, for existential/universal 
quantifications (see O1, O2, O4, and Og). For the encoding of the until oper- 
ator, these observation timepoints (x3 in O3 and a5 in Os) are restricted to 
some interval as described below. 

. We encode the exists operator 3(a : Hı — Ho, ¢) for the MTGC ¢ accord- 
ing to Definition 9 using an additional negative graph condition stating that 
the matched nodes/edges a are not violating the attribute constraints in 
alive(z,,, a). The set alive(x,,a@) contains the constraint £n < Zea (to state 
that a was created before £n) and the constraint t¢q = —1 V Zn < Lao (to 
state that a was not deleted or that it is deleted later than £n). 

. We encode the until operator ¢, Ur ¢2 for the MTGCs @, and @¢2 according 
to Definition 9 using the exists operator (the forall operator used in the GC 
below is only an abbreviation for a usage of the exists operator according to 
Definition 1). Informally, ¢1 Ujr,,4.} ¢2 (the construction is similar for other 
kinds of intervals) is equivalent to A(t’ € [£n + t1, En + te], 65 A V(t" © [ayn + 
t1, t’), 91)) where | and $4 are the reductions of ġı and ¢2, respectively. The 
variable x, refers to the current observation timepoint that depends on the 
timepoint where an enclosing condition has been matched. In the example, 
the variables x», t’, and t” are represented in 7’ by the variables x2, 73, and 
Zs, respectively. The reduction is recursively applied to ¢; and œz resulting 
in ¢ and $4, respectively. The replacement GC for the until subcondition 
spans the last four lines of y’ in Fig. 5. 

. We add all Encoding nodes occurring in 7’ to Gy as depicted in Fig.5. 
The Encoding nodes are used in w’ as containers for the additional variables 
employed in the attribute constraints and are required in G‘, to allow for 
matchings from the adapted graphs of Y’ to G‘,. 


In the following theorem (see [11] for its proof), we state that the operation 


Reduce is sound w.r.t. the satisfaction relations for MTGCs and GCs. 


Theorem 2 (Soundness of Operation Reduce). 


If r € fn, GH = 


Fold(m), y € oye, t € Ro is a timepoint, ig, : 9 —> Gx is the initial 
morphism to Gy, and (G'g, Y) = Reduce(Guy, t, Y), then (ig,,,t) 


G'a 


= y. 


Econ Y iff 


By application of Theorem 2, we can deduce for our running example that the 
MTGC y from Fig. 3 translated by the operation Reduce is satisfied by the graph 
G'y (both given in Fig. 5). For this purpose observe that w from Fig. 3 (simplified 
as stated in Fig. 5) is satisfied by the GH from Fig. 4 for the timepoint t = 10 
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since the unique match of the Task node T, the on edge e1, and the System node 
S satisfies the remaining condition starting at timepoint t = 10. 


7 Tool Support 


We provide tool support for checking finite TGSs against MTGCs as an exten- 
sion of AUTOGRAPH [25]. Firstly, we extended the support of AUTOGRAPH to 
handle TGSs and MTGCs. Secondly, we implemented the operation Fold from 
Definition 8 to consolidate a TGS a to a GH Gy. Thirdly, we implemented the 
operation Reduce from Definition 10 to reduce an MTGC w to a GC yw’ and 
to adapt Gy to a graph G‘,. On the foundation of these three steps and as 
applications of our theoretical results (see Theorems 1 and 2), we then use the 
built-in support of AUTOGRAPH for checking whether the obtained graph G'g 
satisfies the reduced GC 7’. Note that AUTOGRAPH depends in this scenario on 
the constraint solver Z3 [20] to check satisfiability of expressions involving the 
values of cts and dts attributes of sort real as well as the additional constraints 
introduced by Reduce that contain further variables of sort real. 

Considering our running example, we observed negligible runtime and mem- 
ory consumption when verifying that the finite TGS a from Fig. 2 satisfies the 
MTGC y from Fig.3 using our implementation due to the short length of r. 
Overall, the application of the AUTOGRAPH extension to our running example 
shows promising results albeit the potential of further improvements regarding 
efficiency for handling more elaborate problem instances. 


8 Conclusion and Future Work 


We defined Metric Temporal Graph Logic (MTGL) by integrating the metric 
temporal operator until with time bounds into the well-established logic of 
(nested) graph conditions (GCs). This new logic allows to maintain an estab- 
lished binding of graph elements throughout the analysis of a timed sequence of 
(typed attributed) graphs (TGSs). Furthermore, to enable a satisfaction check 
for MTGL conditions by finite TGSs, we introduced a mapping of a finite TGS 
T into a graph with history Gy = Fold(m) and defined a reduction of an MTGL 
condition w to a GC 7’ given by (Gy, Y’) = Reduce(Gy, 0, Y) where the graph 
with history Gy is extended to a graph G‘,. For this mapping and this reduction, 
we have proven that the satisfaction checks for the different representations are 
consistent (ie., r Eras Y 4> Go Ecu Y G'y E Vv’). Finally, we pre- 
sented an extension of the tool AUTOGRAPH allowing to check the satisfaction 
of MTGL conditions by finite TGSs via the introduced mapping and reduction. 

In the future, we want to develop checking procedures bounded MTGL con- 
ditions such that only violations that hold for any possible continuation are 
reported. Moreover, we intend to use our reduction of MTGL conditions to 
related GC counterparts for invariant checking for graph transformation sys- 
tems as considered in [7]. Furthermore, we want to develop extensions of MTGL 
that include branching such as in timed CTL, that are applicable to the setting 
of probabilistic timed graph transformation systems as introduced in [19], or 
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that support additional features e.g. permitting variables in the interval bounds 
of MTGL conditions or in attribute constraints. Finally, we intend to develop a 
model checking procedure for MTGL and these extensions. Besides these tech- 
nical advancements we intend to evaluate and compare our approach based on 
benchmarks from applications domains such as runtime monitoring [18]. 
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Abstract. Dynamic Software Updating (DSU) is a useful technique for 
updating running software without incurring any downtime. Its correct- 
ness must be guaranteed because updating a running software is a com- 
plicated and safety-critical process. In this paper, we present a formal 
tool called KuPC for modeling and verifying dynamic updating of C pro- 
grams. The tool is built on K-a formal semantic framework for program- 
ming languages. We formalize a patch-based dynamic updating mecha- 
nism in K based on the formal executable operational semantics of C. 
The formalization automatically yields an interpreter and several veri- 
fication tools, which can be used to formally analyze the correctness of 
dynamic updating for C programs. To our knowledge, KUPC is the first 
formal tool for code-level verification of dynamic software updating. 


1 Introduction 


Software systems require frequent updating to fixate defects, improve perfor- 
mance, and add new features. For those systems providing 24 x 7 service com- 
mitment, Dynamic Software Updating (DSU) is a useful technique as it does not 
incur system downtime while updating [5]. Such systems are becoming preva- 
lent with the diffusion of Internet of Things (IoT) and Cyber-Physical Systems 
(CPS), where additions, modifications, and removal of behaviors could be done 
in a quick and localized fashion. There is a comprehensive survey on DSU [10]. 

The difficulty of guaranteeing the correctness of dynamic updating is a fun- 
damental barrier when we adopt this technique widely as expected. Correctness 
is crucial to those systems that need dynamic updating because they are usu- 
ally safety-critical and highly-dependable. Meanwhile, dynamically updating a 
running software system is a complicated process, and it is difficult to predict 
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all possible updating results. In order to update a program successfully while 
it is running in practice one has to know everything about that program [6]. 
However, it still lacks effective methodologies and tools to help understand all 
possible behaviors of running programs caused by updating. 

Formal methods are rigorous approaches to program verification. Some 
attempts have been made on applying formal methods to DSU [3,4]. The exist- 
ing approaches suffer one or more difficulties as follows. In some approaches 
formalizing a dynamic update may require abstraction of target programs. Such 
abstraction is usually done manually. It requires both formal methods expertise 
and human intellection to interpret target programs. Some approaches [1,11] 
lack tool support while developing such tools needs substantial efforts. 

To mitigate the above difficulties, we present a formal tool called KUPC for 
modeling and verifying dynamic updating of C programs in this paper. KUPC 
is built upon the formalization of a DSU tool called Ginseng [8] for C programs. 
We formalize the updating strategy of Ginseng atop the operational semantics 
of C in the formal semantic framework called K [9]. From the formalization, 
K automatically generates several tools that can be used for formal analysis of 
dynamic updating of C programs. According to our knowledge, KuPC is the 
first tool for the code-level formal verification of dynamic software updating. 

KupPC has the following three features. (1) KuPC is focused on the code- 
level verification of dynamic updating. It does not require any abstraction or 
transformation of target C programs that are subject to dynamic updating. (2) 
The verification functionalities of KUPC are automatically generated from the 
formalization of dynamic updating mechanisms. No extra effort is needed on the 
implementation. (3) The formalization is built upon the operational semantics 
of the C language. One can easily develop similar tools for the formal analy- 
sis of dynamic updating of other languages such as Java and Python, whose 
operational semantics have already been formally defined in K. 


2 KupC Design 


Patch-based DSU. Many DSU tools achieve dynamic updating by injecting 
patches into running programs [10]. A patch contains all updating contents, 
e.g., new functions and data. Figure 1 (left) is an overview of the patch-based 
updating process. An old-version program is first made updatable by attaching 
additional version information, wrapping user-defined types, and inserting possi- 
ble updating points. They are achieved by the two operations called Dependants 
Updating and Restriction Generating. Next, a patch file p1.c is generated and 
complied by comparing the differences between old and new programs. After an 
update request is invoked, a DSU tool checks whether it is safe to inject the 
compiled patch whenever the running program reaches a pre-specified updating 
point. Safety means that the behavior of the updated program is consistent with 
the expectation. It is guaranteed by the adopted updating policies in DSU tools. 
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Fig. 1. Patch-based dynamic updating and its formalization using K 


If it is safe, the patch is injected and the running program state is transformed 
into the new version by a transformation function that is predefined in the patch. 
The patched program continues to execute from the new state. If updating at 
this point is not safe, the program continues to execute the old version. 

It is worth mentioning that the entire updating process is atomically per- 
formed, that is, the execution keeps being suspended until the completion of the 
updating. Updating in an atomic manner is the most consistent approach that 
simplifies the updating process and reduces unexpected errors. 


The K Framework. K [9] is a state-of-art semantic framework for program- 
ming languages. Many mainstream languages such as C and Java have been 
completely defined in K. One only needs to focus on the formalization of an 
updating mechanism using the pre-defined operational semantics of the targeted 
language. After formalizing the updating mechanism, K automatically gener- 
ates several analysis tools such as program interpreter, state space explorer, and 
model checker. 


Formalization of dynamic updating strategy in K. The basic idea of for- 
malizing a dynamic updating mechanism using K is to formalize the function- 
alities of the mechanism on the basis of the operational semantics of the target 
programming language that the mechanism supports. The right part of Fig. 1 
shows the formalization of the patch-based dynamic updating mechanism, con- 
sisting of the formalization of the five functionalities, respectively. 

The functionalities of an updating mechanism are formalized by a set of 
rewrite rules. For instance, below is a rewrite rule that formalizes the function 
of checking the safety of updating a set of functions at an updating point Loc. 


l »» Loch (35 a Re) he Y restriction 


F 
( TipeSafety(Loe, (=) _) ae (FT Fy oT =), 
ypes 


when ((F € Re) ^ (T == T')) V (F ¢ Re) (SAFETYCHECKING) 
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1 struct Road{ 20 void Calculate(int x){ 1 struct Road{ // modified structure 
2 int dist; 21 LoadG(); 2 int dist; 

3}; 22 Shortest(x); 3 int cost; // new element 

4 struct City{ 23} 4}; 

5 ... // node structure 24 void Query(int x,int y){ 5 void Cheapest(int x){ // newly added 
6}; 25 /* pointi +*/ 6 ... // new function 

7 struct Graph{ 26 Calculate(x); 7} 

8 ... // Road+City.. 27 /* point2 +/ 8 void PrintR(int x){ // modified 
9}; 28 PrintR(x,y); 9 ... // print results and 

10 struct Graph G; 29 /* point3 */ 10 ... // the cheapest path 

11 void Shortest(int x){ 30 } 11} 

12 ... // shortest path.. 31 int main(){ 12 void LoadG(){ // modified 

13 } 32 sae 13 ... // load new data 

14 void PrintR(int x){ 33 Query(0,6); 14} 

15 ... // print results.. BA iors 15 void Calculate(int x){ // modified 
16 } 35 Query(0,6); 16 LoadG(); 

17 void LoadG(){ 86 ea 17 Shortest (x); 

18 ... // load graph data.. 37} 18 Cheapest (x); 

19 } 19 } 


Fig. 2. The snippets of old-version and new-version programs of a GPS application 


In the rule, a pair of brackets is a labeled cell, representing a piece of program 
execution information. £ means F is deleted from the set if the conduction that 
follows the keyword when is true. The condition says that either F is updatable 
(represented by F ¢ Re) or it is un-updatable at the point Loc but its types 
T and T’ (before and after updating, respectively) are the same. Here, Re is 
the set of un-updatable contents at Loc. If the second argument of TypeSafety 
becomes an empty set, it means all the functions in the set are safe to update. 

We totally defined 371 rewrite rules to formalize the updating mechanism 
of Ginseng. We tested the correctness of the rules using the example dynamic 
updating programs provided in Ginseng. These rules are seamlessly compiled 
by K together with the rules defined for the operational semantics of C [2]. 
The compilation yields the formal tool KUPC which supports formal analysis 
of dynamic updating of C programs in various ways such as simulation, state 
exploration, and LTL model checking. 


3 KupC Usage 


KupPC is equipped with an interpreter to execute updatable C programs, a state 
space explorer to search for all possible updating results, and an LTL model 
checker to verify temporal properties of dynamic updating. We demonstrate 
the usage of KUPC using a dynamic updating to a GPS application. The tool, 
examples and a demo video are available https://github.com/dexter-qjq/KupC. 

The program in Fig. 2 (left) is the old version of a GPS system. It calculates 
the shortest path. In the new version in Fig. 2 (right), the new program not only 
shows the shortest path, but also finds the most economic path. Three update 
points are inserted in function Query from Line 24 to Line 30. 
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(a) Original graph (distance) (b) Updated graph (distance & cost) 


Fig. 3. The shortest path before and after updating (Color figure online) 


Simulating a dynamic updating scenario. Given an original C program 
annotated with update points, KUPC can compile it with a patch file and gener- 
ate binary code that is executable on K. During execution, updating is applied 
once reaching a safe updating point. It simulates the behavior of a dynamic 
updating to a program that is running on a real-world operating system. 

Figure 3 shows the results of the simulation. Figures 3(a) and (b) show the 
original graph and the updated graph, respectively. When the update takes place 
at point1, the output of first call is the red path in Fig. 3(a). While the second 
call produces two paths as shown in Fig. 3(b). The red one is the shortest path 
and the green one is the most economic path. 


Case 1 | Case 2 | Case 3 

Update at: point1 | Update at: pointi | Update at: point3 
Output: "7km $3; 7km $3" | Output: "6km; 7km $3" | Output: "6km; 7km $3" 
Case 4 | Case 5 

Update at: point3 | Update at: 

Output: "6km; 6km" l Output: "6km; 6km" 


Fig. 4. All possible updating results searched by the state space explorer of KUPC 


Exploring all dynamic updating results. In addition to simulating one 
possible updating scenario, KUPC can search for all possible updating results by 
exploring each possible updating point using the state space explorer. 

We compile and execute the program map with the option UPSEARCH=1 to 
invoke the state exploration function. Figure 4 shows all five different updating 
results. The outputs are divided into two parts by semicolon, representing the 
results of the two function calls of Query, respectively. Case 1 and Case 2 show 
the results when updating occurs at point1. Case 3 and Case 4 are for point3. 
Case 4 shows the result when updating is not performed. 

While the dynamic updating occurs during the first call of the function Query 
at point3 in Case 3, the output of the first call is not affected by updating. The 
reason is that the updated content will not take effect until the next access after 
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updating. Therefore, the outputs in Case 4 are exactly the same as the ones in 
Case 5. Updating at point2 violates the safety policies. Therefore, there is no 
case corresponding to point2. All the updating results searched are valid. 


Model checking temporal properties. Dynamic updating is a temporal 
behavior in that the properties before and after updating may be different. Such 
differences can be formalized as temporal properties. Another attractive function 
of KuPC is to verify these temporal properties using LTL model checking. 

As an example, we verify whether or not updating in the GPS exam- 
ple can be finally deployed. First, we introduce an atomic proposition called 
__update, which is false before updating and becomes true after the program 
is updated. Given the command UPLTLMC = "TrueLtl ULtl __update" ./map, 
KupC returns true, indicating that updating can be eventually performed. 

Another property of interest is that the shortest path must become 7 after the 
system is updated. It can be defined as an LTL formula __update->(<>(x==7)), 
where variable x stores the value of the shortest path. Given the com- 
mand UPLTLMC="" (?~Lt1__update’\’/Lt1? (’TrueLt1ULt1’ (?x==7’)’?)??)?"./map, 
KuPC returns true, indicating that updating result is correct as expected. 


4 Concluding Remarks and Ongoing Work 


We have presented the design and implementation of an operational semantics- 
based verification tool called KuPC for dynamic software updating. Three case 
studies showed the effectiveness of KUPC for the formal analysis of the dynamic 
software updating of C programs by simulation, state exploration, and LTL 
model checking. Semantics-based formalization is promising in providing effec- 
tive and practical solutions for guaranteeing the correctness of dynamic software 
updating. For instance, Lounas et al. achieved formal verification of dynamic 
updating of Java programs based on Java’s semantics [7]. Compared with their 
approach, our approach is more general and extendable as K provides an ele- 
gant semantic framework for the definition of programming languages and an 
easy-to-use automated verification tool generation service. 

KupPC is at a good position for practical code-level verification of DSU. It is 
directly applicable to the code and shows the feasibility of formalizing a dynamic 
updating mechanism on the basis of the operational semantics of target program- 
ming languages. To verify the dynamic updating of more complex and practical 
programs, a complete semantics of C including those of standard libraries is 
needed. The efficiency of KUPC also needs to examine although the efficiency of 
K has been validated [9]. There is ongoing work on these directions. 

KuPC has some limitations because of theoretical and practical challenges 
in the formal verification of DSU. Theoretically, Gutpa et al. have shown the 
undecidability of the reachability of updating points [3]. Another issue is that 
there is no uniform definition of correctness of dynamic updating. The logical 
correctness of dynamic updating depends on target programs and its formal- 
ization relies on programmers’ interpretation. Although KUPC does not require 
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any abstraction of target programs, we suspect that certain abstraction is nec- 
essary for optimizing efficiency and scalability of the verification. For instance, 
a function that is not modified in a new version can be considered atomic for 
verification purpose. It is still an ongoing quest for an appropriate abstraction of 
target programs for the scalability while maintaining the validity of verification. 
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Abstract. PLEAK is a tool to capture and analyze privacy-enhanced 
business process models to characterize and quantify to what extent the 
outputs of a process leak information about its inputs. PLEAK incorpo- 
rates an extensible set of analysis plugins, which enable users to inspect 
potential leakages at multiple levels of detail. 


1 Introduction 


Data minimization is a core tenet of the European General Data Protection 
Regulation (GDPR) [2]. According to GDPR, usage of private data should be 
limited to the purpose for which it has been collected. To verify compliance with 
this principle, privacy analysts need to determine who has access to the data and 
what private information these data may disclose. Business process models are 
a rich source of metadata to support this analysis. Indeed, these models capture 
which tasks are performed by whom, what data are taken as input and output 
by each task, and what data are exchanged with external actors. Process models 
are usually captured using the Business Process Model and Notation (BPMN). 
This paper introduces PLEAK! — the first tool to analyze privacy-enhanced 
BPMN models in order to characterize and quantify to what extent the outputs 
of a process leak information about its inputs. The top level (Boolean level, 
Sect. 2), tell us whether or not a given data in the process may reveal information 
about a given input. The middle level, the qualitative level (Sect. 3), goes further 
by indicating which attributes of (or functions over) a given input data object are 
potentially leaked by each output, and under what conditions this leakage may 
occur. The lower level quantifies to what extent a given output leaks information 
about an input, either in terms of a sensitivity measure (Sect. 4) or in terms of 
the guessing advantage that an attacker gains by having the output (Sect. 5). 


1 https://pleak.io (account: demo@example.com, password: pleakdemo, manual: 
https://pleak.io/wiki/, source code: https://github.com/pleak-tools/). 
© The Author(s) 2019 
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Fig. 1. Aid distribution process 


To illustrate the capabilities of PLEAK, we refer to an “aid distribution” 
process in Fig. 1. This process starts when a nation requests aid from the inter- 
national community to handle an emergency and a country offers to route a ship 
to help transport people and/or goods. The goal of the process is to allocate 
a port and a berth to the ship but not to reveal information about ships that 
are unable to help or the parameters of the ports. The process uses a type of 
privacy-enhancing technology (PET) known as secure multiparty computation 
(MPC). MPC allows participants to perform joint computations such that none 
of the parties gets to see the data of the other parties, but can learn the out- 
put depending on the private inputs. Given a ship, a deadline and the list of 
ports, task “Compute reachable ports” retrieves the list of ports reachable by 
the deadline. Tasks with identical names in different pools denote MPC compu- 
tations carried out jointly by multiple stakeholders. Task “Select feasible ports” 
retrieves ports with the capacity to host the ship. The third task selects a port, 
a berth, and a slot for the ship, and discloses them to both participants. 


Related Work. We are interested in privacy analysis of business processes and 
in this space Anica [1] is closest to our work. However, PLEAK’s analysis is more 
fine-grained. Anica allows designers to see that a given object Ol may contain 
information derived from a sensitive data object O2, but it can neither explain 
how the data in O2 is derived from O1 (cf. Leaks-When analysis) nor to what 
extent the data in O2 leaks information from O1 (cf. sensitivity and guessing 
advantage analysis). In addition, they are interested in security levels and our 
high level analysis looks at PETs deployed in the process. 


2 PE-BPMN Editor and Simple Disclosure Analysis 


The model in Fig. 1 is captured Privacy-Enhanced BPMN (PE-BPMN) [7,8]. PE- 
BPMN uses stereotypes to distinguish used PETs, e.g. MPC or homomorphic 
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encryption, that affect which data is protected in the process. The PE-BPMN 
editor allows users to attach stereotypes to model elements and to enter the 
stereotype’s parameters where applicable. The editor integrates a checker, which 
verifies stereotype specific restrictions. For example, that: (1) when a task has 
an MPC stereotype, there is at least one other “twin” task with the same label 
in another pool, since an MPC computation involves at least two parties; (2) 
when one of these tasks is enabled, the other twin tasks is eventually enabled; 
and (3) the joint computation has at least one input and one output. 

Given a valid PE-BPMN model, PLEAK runs a binary privacy analysis, which 
produces a simple disclosure report and data dependency matrix. The disclosure 
report in Fig.2 tells us whether or not a stakeholder gets to see a given data 
object. In the report “V” indicates that a data object (in columns) is visible to a 
stakeholder (in rows). Marker “H” (hidden) is used for data with cryptographic 
protection, e.g. encrypted data. Row “shared over” refers to the network service 
provider, who may also see some of the data (e.g. unencrypted data objects). 


z berth feasible ports parameters port reachable ports ship ship slotassignment slot 
Nation Vv Vv - V - - Vv V 
Ship manager - - v - v v v 


Shared over 


Fig. 2. Simple disclosure report for the aid distribution process in Fig. 1 


3 Qualitative Leaks-When Analysis 


Leaks-When analysis [3] is a technique that takes as input a SQL workflow and 
determines, for each (output, input) pair which attributes, if any, of the input 
object are disclosed by the output object and under which conditions. A SQL 
workflow is a BPMN process model in which every data object corresponds to a 
database table, defined by a table schema, and every task is a SQL query that 
transforms the input tables of the task into its output tables. Figure 3 shows a 
sample collaborative SQL workflow — a variant of the “aid distribution” example 
where the disclosure of information about ships to the aid-requesting country is 
made incrementally. The figure shows the SQL workflow alongside the query 
corresponding to task “Select reachable ports”. All data processing tasks and 
input data objects are specified analogously. 

To perform a Leaks-When analysis, the user selects one or more output data 
objects and clicks the “SQL LeaksWhen” button. The Leaks-When analysis 
shows one tab for each output data object and one report for each column in the 
output table. The report is generated by extracting all runs of the workflow and 
applying dataflow analysis techniques to each run in order to infer all relevant 
data dependencies. An example of a leaks-when report (in graphical form) is 
shown in Fig. 4. The first input to Filter is the disclosed value (leaks branch), e.g. 
the arrival time. The second input (when branch) is the condition of outputting 
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the first input, e.g. that the arrival time is less than the deadline and the ship 
has the required name. Each Leaks-When report ends with such filter but the 
rest of the graph aggregates the computations described in SQL. 


Pleak SQL-privacy editor Œ Change analyzer 


#3: Analyze Sensitivities [J E| SQL Leakswhen ff [El BPMN LeaksWhen 
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Fig. 3. Aid distribution SQL workflow in PLEAK SQL editor 


4 Sensitivity Analysis and Differential Privacy 


The sensitivity of a function is the 
expected maximum change in the out- 
put, given a change in the input of 
the function. Sensitivity is the basis 
for calibrating the amount of noise to 
be added to prevent leakages on sta- 
tistical database queries using a differ- 
ential privacy mechanism [6]. Differ- 
ential privacy ensures that it is diffi- 
cult for an attacker, who observes the 
query output, to distinguish between 
two input databases that are suffi- 
ciently “close” to each other, e.g. differ 
in one row. PLEAK tells the user how to sample noise to achieve differential pri- 
vacy, and how this affects the correctness of the output. PLEAK provides two 
methods — global and local — to quantify sensitivity of a task in a SQL workflow 
or of an entire SQL workflow. These methods can be applied to queries that 
output aggregations (e.g. count, sum, min, max). 

Global sensitivity analysis [5] takes as input a database schema and a query, 
and computes the theoretical bounds for sensitivity, which are suitable for any 
instance of the database. This shows how the output changes if we add (remove) 


Fig. 4. Sample leaks-when report 
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a row to (from) some input table. The analysis output is a matrix that shows the 
sensitivity w.r.t. each input table separately. It supports only COUNT queries. 

Sometimes, the global sensitivity may be very large or even infinite. Local 
sensitivity analysis is an alternative approach, which requires as input not only 
a schema and a query, but also a particular instance of the underlying database, 
and it tells how the output changes with the change from the given input. Using 
the database instance improves the amount of noise needed to ensure differential 
privacy w.r.t. the number of rows. Moreover, it supports COUNT, SUM, MIN, 
MAX aggregations, and allows to capture more interesting distances between 
input tables, such as change in a particular attribute of some row. In PLEAK, 
we have investigated a particular type of local sensitivity, called derivative sen- 
sitivity [4], which is in first place adapted to continuous functions, and is closely 
related to function derivative. PLEAK uses derivative sensitivity to quantify the 
required amount of noise as described in [4]. 

An example of derivative sensitivity analysis output is shown in Fig. 5a. It 
tells that the derivative sensitivity w.r.t. the Ship table is 4, and that a differential 
privacy level of € = 1 can be achieved using smoothness parameter 8 = 0.05. 
To this end, we would have to add an amount of (Laplacian) noise such that 
the relative error of the output is 74%. More precisely, if the correct output 
is y, the noised answer will be between 0.26y and 1.74y with probability 80%. 
A tutorial on sensitivity analyzer can be found at https://pleak.io/wiki/sql- 
derivative-sensitivity-analyser. More examples can be found in the full version 
of this paper [9]. 
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Fig. 5. Examples of quantitative analysis 
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5 Attacker’s Guessing Advantage 


While function sensitivity as defined in Sect.4 can be used directly to compute 
the noise required to achieve e-differential privacy, it is in general not clear which 
€ is good enough, and the “goodness” depends on the data and the query [6]. 
We want a more standard security measure, such as guessing advantage, defined 
as the difference between the posterior (after observing the output) and prior 
(before observing the output) probabilities of attacker guessing the input. 

The guessing advantage analysis of PLEAK takes as input the desired upper 
bound on attacker’s advantage, which ranges between 0% and 100%. The user 
specifies particular subset of attributes that the attacker is trying to guess for 
some data table record, within given precision range. The user may define prior 
knowledge of the attacker, which is currently expressed as an upper and a lower 
bound on an attribute. The analyzer internally converts these values to a suitable 
£, and computes the noise required to achieve the bound on attacker’s advantage. 

Figure5b shows an example parameters and output of this analysis. The 
attacker already knows that the longitude and latitude of a ship are in the range 
(0...300] while the speed is in [20...80]. His goal is to learn the location of any 
ship with a precision of 5 units. If we want to bound the guessing advantage by 
30% using differential privacy, the relative error of the output will be 43.25%. 
For a tutorial see https: //pleak.io/wiki/sql-guessing-advantage-analyser. 
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Abstract. Massive parallelism, and energy efficiency of GPUs, along 
with advances in their programmability with OpenCL and CUDA pro- 
gramming models have made them attractive for general-purpose com- 
putations across many application domains. Techniques for testing GPU 
kernels have emerged recently to aid the construction of correct GPU 
software. However, there exists no means of measuring quality and effec- 
tiveness of tests developed for GPU kernels. Traditional coverage criteria 
over CPU programs is not adequate over GPU kernels as it uses a com- 
pletely different programming model and the faults encountered may be 
specific to the GPU architecture. 

We address this need in this paper and present a framework, 
CLTestCheck, for assessing quality of test suites developed for OpenCL 
kernels. The framework has the following capabilities, 1. Measures ker- 
nel code coverage using three different coverage metrics that are inspired 
by faults found in real kernel code, 2. Seeds different types of faults in 
kernel code and measures fault finding capability of test suite, 3. Simu- 
lates different work-group schedules to check for potential deadlocks and 
data races with a given test suite. We conducted empirical evaluation of 
CLTestCheck on a collection of 82 publicly available GPU kernels and 
test suites. We found that CLTestCheck is capable of automatically mea- 
suring effectiveness of test suites, in terms of kernel code coverage, fault 
finding and revealing data races in real OpenCL kernels. 


Keywords: Testing - Code coverage - Fault finding - Data race - 
Mutation testing - GPU - OpenCL 


1 Introduction 


Recent advances in the programmability of Graphics Processing Units (GPUs), 
accompanied by the advantages of massive parallelism and energy efficiency, have 
made them attractive for general-purpose computations across many application 
domains [19]. However, writing correct GPU programs is a challenge owing to 
many reasons [13] — a program may spawn millions of threads, which are clustered 
in multi-level hierarchies, making it difficult to analyse; programmer assumes 
responsibility for ensuring concurrently executing threads do not conflict by 
checking threads access disjoint parts of memory; complex striding patterns of 
memory accesses are hard to reason about; GPU work-group execution model 
and thread scheduling vary platform to platform and the assumptions are not 
© The Author(s) 2019 
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explicit. As a consequence of these factors, GPU programs are difficult to analyse 
with existing static or dynamic approaches [13]. Static techniques are thwarted 
by the complexity of the sharing patterns. Dynamic techniques are challenged 
by the combinatorial explosion of thread interleavings and space of possible data 
inputs. Given these difficulties, it becomes important to understand the extent 
to which a GPU program has been analysed and tested, and the code portions 
that may need further attention. 

In this paper, we focus on GPU program testing and address concerns with 
respect to quality and adequacy of tests developed for GPU programs. We 
present a framework, CLTestCheck, that measures test effectiveness over GPU 
kernels written using OpenCL programming model [7]. The framework has three 
main capabilities. The first capability is a technique called schedule amplifica- 
tion to check execution of test inputs over several work-group schedules. Existing 
GPU architecture and simulators do not provide a means to control work-group 
schedules. The OpenCL specification provides no execution model for inter work- 
group interactions [21]. As a result, the ordering of work-groups when a kernel 
is launched is non-deterministic and there is, presently, no means for checking 
the effect of schedules on test execution. We provide this monitoring capability. 
For a test case T; in test suite TS, instead of simply executing it once with 
an arbitrary schedule of work-groups, we execute it many times with a differ- 
ent work-group schedule in each execution. We build a simulator that can force 
work-groups in a kernel execution to execute in a certain order. This is done in 
an attempt to reveal test executions that produce different outputs for different 
work-group schedules which inevitably point to problems in inter work-group 
interactions. 

The second capability of CLTestCheck is measuring code coverage for 
OpenCL kernels. The structures we chose to cover were motivated by OpenCL 
bugs found in public repositories like Github and research papers for GPU 
testing. We define and measure coverage over synchronisation statements, loop 
boundaries and branches in OpenCL kernels. 

The final capability of the framework is creating mutations by seeding differ- 
ent classes of faults relevant to GPU kernels. We assess the effectiveness of test 
suites in uncovering the seeded faults. 

We empirically evaluate CLTestCheck using 82 kernels and associated test 
input workloads from industry standard benchmarks. The schedule ampli- 
fier in CLTestCheck was able to detect deadlocks and inter work-group data 
races in benchmarks. We were able to detect barrier divergence and kernel 
code that requires further tests using the coverage measurement capabilities 
of CLTestCheck. Finally, the fault seeding capability was able to expose unnec- 
essary barriers and unsafe accesses in loops. 

The CLTestCheck framework aims to help developers assess how well the 
OpenCL kernels have been tested, kernel regions that require further testing, 
uncover bugs sensitive to work-group schedules. In summary, the main contri- 
butions in this paper are: 


1. Schedule amplification to evaluate test executions using different work-group 
schedules. 

2. Definition and measurement of kernel code coverage considering synchronisa- 
tion statements, loop boundaries and branch conditions. 
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3. Fault seeder for OpenCL kernels that seeds faults from different classes. The 
seeded faults are used to assess the effectiveness of test suites with respect to 
fault finding. 

4. Empirical evaluation on a collection of 82 publicly available GPU kernels, 
examining coverage, fault finding and inter work-group interactions. 


The rest of this paper is organised as follows. We present background on the 
OpenCL programming model in Sect.2. Related work in GPU program testing 
and verification is discussed in Sect.3. CLTestCheck capabilities is discussed in 
Sect. 4. Experiment setup and results of our empirical evaluation is discussed in 
Sects. 5 and 6, respectively. 


2 Background 


The success of GPUs in the past few years has been due to the ease of pro- 
gramming using the CUDA [17] and OpenCL [7] parallel programming models, 
which abstract away details of the architecture. In these programming models, 
the developer uses a C-like programming language to implement algorithms. The 
parallelism in those algorithms has to be exposed explicitly. We now present a 
brief overview of the core concepts of OpenCL, the programming model used in 
this paper. 

OpenCL is a programming framework and standard set from Khronos, for 
heterogeneous parallel computing on cross-vendor and cross-platform hardware. 
In the OpenCL architecture, CPU-based Host controls multiple Compute Devices 
(for instance CPUs and GPUs are different compute devices). Each of these 
coarse grained compute devices consists of multiple Compute Units which in 
turn contain one or more processing elements (a.k.a streaming processors). The 
processing elements execute groups of individual threads, referred to as work- 
groups, concurrently. The functions executed by the GPU threads are called 
kernels, parameterised by thread and group id variables. OpenCL has four types 
of memory regions: global and constant memory shared by all threads in all 
work-groups, local memory shared by threads within the same work-group and 
private memory for each thread. Kernels cannot write to the constant memory. 

GPUs have SIMT (single instruction, multiple thread) execution model that 
executes batches of threads (warps) in lock-step, i.e all threads in a work-group 
execute the same instruction but on different data. If the control flow of threads 
within the same work-group diverges, the different execution paths are scheduled 
sequentially until the control flows reconverge and lock-step execution resumes. 
Sequential scheduling caused by divergence results in a performance penalty, 
slowing down execution of the kernel. 

Betts et al. [2] describe two specific classes of bugs that make GPU kernels 
harder for verification than sequential code, data races and barrier divergence. 
Inter work-group data race is referred to as a global memory location is written 
by one or more threads from one work-group and accessed by one or more threads 
from another work-group. Intra work-group data race is referred to as a global or 
local memory location is written by one thread and accessed by another from the 
same work-group. Barrier is a synchronisation mechanism for threads within a 
work-group in OpenCL and is used to prevent intra work-group data race errors. 
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Barrier divergence occurs if threads in the same group reach different barriers, 
in which case kernel behaviour is undefined [2] and may lead to intra work-group 
data race. 

In this paper, we focus on covering barrier functions to help detect intra 
work-group barrier divergence errors and revealing problems with inter work- 
group interactions using work-group schedule amplification. 


3 Related Work 


We discuss related work in the context of work-group synchronisation, verifica- 
tion and testing of GPU programs. 


Inter Work-group Synchronisation for OpenCL Kernels. Barrier functions in the 
OpenCL specification [7] help synchronise threads within the same work-group. 
There is no mechanism, however, to synchronise threads belonging to different 
work-groups. One solution for this problem is to split a program into multi- 
ple kernels with the CPU executing the kernels in sequence providing implicit 
synchronisation. The drawback with this method is the overhead incurred in 
launching multiple kernels. Xiao et al. [24] proposed an implementation of inter 
work-group barrier that relies on information on the number of work-groups. 
This method is not portable as the number of launched work-groups depends on 
the device. Sorensen et al. [22] extended it to be portable by discovering work- 
group occupancy dynamically. Their implementation of inter work-group barrier 
synchronisation is useful when the developer knows there is interaction between 
work-groups that needs to be synchronised. Our contribution is in detecting 
undesired inter work-group interactions, not intended by the developer. 


GPU Kernel Verification. Verification of GPU kernels to detect data races and 
barrier divergence bugs has been explored in the past. Li et al. [14] introduced a 
Satisfiability Modulo Theories (SMT) based approach for analysing GPU kernels 
and developed a tool called Prover of User GPU (PUG). The main drawback of 
this approach is scalability. With an increasing number of threads, the number of 
possible thread interleavings grows exponentially, making the analysis infeasible 
for large number of threads. GRace [25] and GMRace [26] were developed for 
CUDA programs to detect data races using both static and dynamic analysis. 
However, they do not support detection of inter work-group data races. 

GKLEE [15] and KLEE-CL [3], based on dynamic symbolic execution, pro- 
vides data race checks for CUDA and OpenCL kernels, respectively. Both tools 
are restricted by the need to specify a certain number of threads, and the lack 
of support for custom synchronisation constructs. Scalability and general appli- 
cability is a challenge with these tools. 

Leung et al. [13] present a flow-based test amplification technique for verifying 
race freedom and determinism of CUDA kernels. For a single test input under a 
particular thread interleaving, they log the behaviour of the kernel and check the 
property. They then amplify the result of the test to hold over all the inputs that 
have the same values for the property integrity-inputs. The test amplification 
approach in [13] can check the absence of data-races, not the presence. Addi- 
tionally, their approach amplifies across the space of test inputs, not work-group 
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schedules as done in our schedule amplifier. GPUVerify [2] is a static analy- 
sis tool that transforms a parallel GPU kernel into a two-threaded predicated 
program with lock-step execution and checks data races over this transformed 
model. The drawback of GPUVerify is that it may report false alarms and has 
limited support for atomic operations. 


Test Effectiveness Measurement. Measuring effectiveness of tests in terms of 
code coverage and fault finding is common for CPU programs [6,18]. Support for 
GPU programs is scarce. GKLEE is the only tool that provides support for code 
coverage for CUDA GPU kernels. Given a kernel, it converts it into its sequential 
C program version (using Perl scripts) and applies the Gcov utility supplied 
with GCC for measuring code coverage. This form of coverage measurement 
disregards the GPU programming model. In our approach, we measure coverage 
conforming to the OpenCL programming model. With respect to fault seeder 
and schedule amplification, we are not aware of any existing work that provides 
these capabilities for GPU kernels to help measure effectiveness of test suites. 
The CLTestCheck framework is discussed in the next Sect. 4. 


4 Our Approach 


In this Section, we present the CLTestCheck framework that provides capabilities 
for kernel code coverage measurement, mutant generation and schedule amplifi- 
cation. To understand the kinds of programming bugs! encountered by OpenCL 
developers, we surveyed several publicly available OpenCL kernels and associ- 
ated bug fix commits. A summary of our findings is shown in Table 1. We found 
bugs most commonly occur in the following OpenCL code constructs: barriers, 
loops, branches, global memory accesses and arithmetic computations. We seek 
to aid the developer in assessing quality of test suites in revealing these bug 
types using CLTestCheck. A detailed discussion of CLTestCheck capabilities is 
presented in the following sections. 


4.1 Kernel Code Coverage 


We define coverage over barriers, loops and branches in OpenCL code to check 
rigour of test suites in exercising these code structures. 


Branch Coverage. GPU programs are highly parallelised, executed by numerous 
processing elements, each of them executing groups of threads in lock step, which 
is very different from parallelism in CPU programs, where each thread executes 
different instructions with no implicit synchronisation, as seen in lock-step exe- 
cution. Kernel code for all the threads is the same, however, the threads may 
diverge, following different branches based on the input data they process. As 
seen in Table 1, uncovered branches and branch conditions are an important class 
of OpenCL bugs. Lidbury et al. [16] report in their work that branch coverage 


1 These are kernel bugs that violate the specification of the program or are associated 
with executions that lead to undefined behaviour. 
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Table 1. Summary of bug fixing commits we collected 


# | Code Structure Bug Type Repository 


1 | Barrier Missing barriers Winograd-OpenCL [10], 
histogram [13], 
reduction [13], OP2 [3] 


2 | Removing unnecessary barriers| Winograd-OpenCL [10] 
3 | Loop Incorrect condition mexcl [5], particles [8] 
4 Incorrect boundary value clSPARSE [1] 

5 Missing loop boundary Pannotia [21] 

6 | Branch Missing else branch liboi [11] 

7 | Incorrect condition mexcl [5], 


ClGaussianPyramid [4] 


8 |Global memory access Inter work-group data race Parboil-spmv [16], 
lonestar-bfs [21], 
lonestar-sssp [21] 


9 | Arithmetic Computations Incorrect arithmetic operators |mcxcl [5], 
ClGaussianPyramid [4] 


measurement is crucial for GPU programs but is currently lacking. To address 
this need, we define branch coverage for GPU programs as follows, 


#-covered branches 
total #branches 


branch coverage = 100% (1) 


Branch coverage measures adequacy of a test suite by checking if each branch 
of each control structure in GPU code has been executed by at least one thread. 


Loop Boundary Coverage. In our survey of kernel bugs shown in Table1, we 
found bugs related to loop boundary values and loop conditions were fairly 
common. For instance, bug #3 found in the mcxcl program allowed the loop 
index to access memory locations beyond the end of the array due to an erroneous 
loop condition. We assess adequacy of test executions with respect to loops by 
considering the following cases, 


Loop body is not executed, 

Loop body is executed exactly once, 
Loop body is executed more than once 
Loop boundary value is reached 


PwnNr 


#loops satisfying casei 
total #loops 


where case; refers to one of the four loop execution cases listed above. 


Loop boundary coveragecasei = 


100% (2) 


Barrier Coverage. Barrier divergence occurs when the number of threads within 
a work-group executing a barrier is not the same as the total number of threads in 
that work-group. Kernel behaviour with barrier divergence is undefined. Barrier 


CLTestCheck: Measuring Test Effectiveness for GPU Kernels 321 


related bugs, missing barriers and unnecessary barriers, is a common class of 
GPU bugs according to our survey. We define barrier coverage as follows. 


#-covered barriers x 100% (3) 


barrier coverage = 


total #barriers 


Barrier coverage measures adequacy of a test suite by checking if each barrier 
in GPU code is executed correctly. Correct execution of a barrier without barrier 
divergence, covered barrier, is when it is executed by all threads in any given 
work-group. 


4.2 Fault Seeding 


Mutation testing is known to be an effective means of estimating the fault finding 
effectiveness of test suites for CPU programs [9]. We generate mutations using 
traditional mutant operators, namely, arithmetic, relational, bitwise, logical and 
assignment operator types. In Tablel, bug fixes #3, #7 and #8 show that 
traditional arithmetic and relational operator mutations remain applicable to 
GPU programs. In addition, we define three mutations specifically for OpenCL 
kernels: barrier mutation, image access mutation and loop boundary mutation 
inspired by bug fixes #1 to #5. 

The barrier mutation operator we define is deletion of an existing barrier 
function call, to reproduce bugs similar to #1 and #2 in Table 1. OpenCL pro- 
vides 2D and 3D image data structures to facilitate access to images. Multi- 
dimensional arrays are not supported in OpenCL. Image structures are accessed 
using read and write functions that take the pixel coordinates in the image 
as parameter. We perform image access mutations for 2D or 3D coordinates 
by increasing or decreasing one of the coordinates or exchanging coordinates. 
Finally, we define loop boundary mutations as either (1) skipping the loop, (2) 
allowing n-1 iterations of the loop and (3) allowing n+1 iterations of the loop 
where n is the number of iterations when the loop boundary is reached. The 
mutant operators we use in this paper are summarised in Table 2. 


Table 2. Summary of mutation operators 


Type of Operator Mutants 
Arithmetic | Binary | +, —, *, /, % 
Unary | -(negation), ++, -- 


Relational <,>,==,<=,>=,!= 

Logical &&,||,! 

Bitwise &,|,°,7,<<, >> 

Assignment , +=, ,*=, /=, %=,<<=,>>=,&=,|=, 
Barrier Delete barrier function call 


Image coordinates | Change coordinates when accessing images 


Loop boundary Change the boundary value in loop condition check 
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4.3 Schedule Amplification 


When a kernel execution is launched the GPU schedules work-groups on com- 
pute units in a certain order. Presently, there is no provision for determining 
this schedule or setting it in advance. The scheduler makes the decision on the 
fly subject to availability of compute units and readiness of work-groups for exe- 
cution. The order in which work-groups are executed with the same test input 
can differ every time the kernel is executed. OpenCL specification has no execu- 
tion model for inter work-group interactions and provides no guarantees on how 
work-groups are mapped to compute units. In our approach, we execute each 
test input over a set of schedules. In each schedule, we fix the work-group that 
should execute first. All other work-groups wait till it has finished execution. 
The work-group going first is picked so that we achieve a uniform distribution 
over the entire range of work-groups in the set of schedules. The order of exe- 
cution for the remaining work-groups is left to the scheduler. For a test case, T 
over a kernel with G work-groups, we will generate N schedules, with N < G, 
such that a different work-group is executed first in each of the N schedules. 
The number of schedules, N, we generate is much lesser than the total num- 
ber of schedules which is typically infeasible to check. The reason we only fix 
the first work-group in the schedule is because, most data races or deadlocks 
involve interactions between two work-groups. Fixing one of them and picking a 
different work-group each time, significantly reduces the search space of possible 
schedules. We cannot provide guarantees with this approach. However, with lit- 
tle extra cost we are able to check significantly more number of schedules than is 
currently possible. We believe this approach will be effective in revealing issues, 
if any, in inter work-group interactions. 

To illustrate this, we consider a kernel co running on four work-groups. The 
CLTestCheck schedule amplifier will insert code on the host and GPU side, 
shown in Listings 1.1 and 1.2, to generate different work-group schedules. 


Listing 1.1: Schedule OpenCL kernel (CPU-side) 


// Generate a value in the range of [0,4) 

int target_group = randint (4); 

// Pass the value as a macro to GPU code 

sprintf (clOptions ,"-DTARGET_GROUP=/d", target_group) ; 


Listing 1.2: Schedule OpenCL kernel (GPU-side) 


if (my_group_id == TARGET_GROUP) { 
// Original code here executed by target group 
A[(1 - buf) * 4 + tid] = A[buf * 4 + (tid + 1) % 4]; 
atom_increase(num_threads_finishes) ; 

} else { 
while (num_threads_finishes != group_size) continue; 
// Original code executed by other groups 
AC(1 - buf) * 4 + tid] = A[buf * 4 + (tid + 1) % 4]; 


In this example, before the GPU kernel is launched, the host side generates a 
random value in the range of available work-group ids. This value is the id of the 
selected work-group to be executed first and is passed to the kernel code using a 
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macro definition. On the kernel side, each thread determines if it belongs to the 
selected work-group. Threads in the selected work-group proceed with executing 
the kernel code while threads belonging to other work-groups wait. After the 
selected work-group completes execution, the remaining work-groups execute 
the original kernel in an order based on mapping to available compute units 
(occupancy bound execution model [22]). With different work-group schedules 
generated by the schedule amplifier, we were able to detect the presence of inter 
work-group data races using a single GPU platform. Betts et al. [2], on the other 
hand, focus on intra work-group data races on different GPU platforms. 


4.4 Implementation 


CLTestCheck is implemented using Clang LibTooling [12]. We instrument 
OpenCL kernel source code to measure coverage, generate mutations and mul- 
tiple work-group schedules automatically. Our implementation is available at 
https: / /github.com/chao-peng/CLTestCheck. 

Coverage Measurement. To record branches, loops and barriers executed 
within each kernel when running tests, we instrument the kernel code with data 
structures and statements recording the execution of these code structures. For 
each work-group, we introduce three local arrays, whose size is determined by the 
number of branches, loops and barriers accessible by threads in that work-group. 
To measure branch coverage, we add statements at the beginning of each then- 
and else-branch to record whether that branch is enabled. Similarly, statements 
to record the number of iterations of loops are added at the beginning of each loop 
body. At the end of the kernel, the information contained in the data structures 
is processed to compute coverage. 

Fault Seeder and Mutant Execution. The CLTestCheck fault seeder gen- 
erates mutants and executes them with each of the tests in the test suite to 
compute mutation score, as the fraction of mutants killed. The CLTestCheck 
fault seeder translates the target kernel source code into an intermediate form 
where all the applicable operators are replaced by a template string containing 
the original operator, its ID and type. The tool then generates mutants from this 
intermediate form. Once mutants are generated, the tool executes each of the 
mutant files and checks if the test suite kills the mutant. We term the mutant 
as killed if one of the following occurs: program crashes, deadlocks or produces 
a result different from the original kernel code. 

Schedule Amplification. As mentioned earlier, we generate several schedules 
for each test execution by requiring a target work-group to execute the kernel 
code first and then allowing other work-groups to proceed. The target work- 
group is selected uniformly across the input space of work-group ids. To achieve 
coverage of this input space, we partition work-group ids into sets of 10 work- 
groups. Thus if we have N work-groups, we partition them into N/10 sets. The 
first set has work-group ids 0 to 9, the second set has ids 10 to 19 and so on. 
We then randomly pick a target work-group, W+, from each of these sets to go 
first and generate a corresponding schedule of work-groups, {W;, Sy—1i}, where 
Sw-_1 refers to the schedule of remaining N — 1 work-groups generated by the 
GPU execution model which is non-deterministic. For N /10 sets of work-groups, 
we will have N/10 schedules of the form {W;, Sv_i} (a W; first schedule). The 
test input is executed using each of these N/10 W; first schedules. Due to the 
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non-deterministic nature of Sjy_1, we repeat the test execution with a chosen 
W; first schedule 20 times. This will enable us to check if the execution model 
generates different Sy—ı and evaluate executions with 20 such orderings. 


5 Experiment 


In our experiment, we evaluate the feasibility and effectiveness of the coverage 
metrics, fault seeder and work-group schedule amplifier proposed in Sect. 4 using 
OpenCL kernels from industry standard benchmark families and their associated 
test suites. We investigate the following questions: 


Q1. Coverage Achieved: What is the branch, barrier and loop coverage 
achieved by test suites over OpenCL kernels in our subject benchmarks? 
To answer this question, we use our implementation to instrument and anal- 
yse kernel source code to record visited branches, barrier functions, loop 
iterations along with information on executing work-group and threads. 

Q2. Fault Finding: What is the mutation score of test suites associated with 

the subject programs? 
For each benchmark, we generate all possible mutants by analysing the 
kernel source code and applying the mutation operators, discussed in Sect. 4, 
to eligible locations. We then assess number of mutants killed by the tests 
associated with each benchmark. To check if a mutant is killed, we compared 
execution results between the original program and mutant. 

Q3. Deadlocks and Data Races: Can the tests in the test suite give rise to 
unusual behaviour in the form of deadlocks or data races? Deadlocks occur 
when two or more work-groups are waiting on each other for a resource. 
Inter work-group data races occur when test executions produce different 
outputs for different work-group schedules. For each test execution in each 
benchmark, we generate 20 x N/10 different work-group schedules, where N 
is total number of work-groups for the kernel, and check if the outputs from 
the execution change based on work-group schedule. 


Subject Programs. We used the following benchmarks for our experiments, 1. 
Nine scientific benchmarks with 23 OpenCL kernels from Parboil benchmark 
suite [23], 2. scan benchmark [20], with 3 kernels, that computes parallel prefix 
sum, 3. Five applications containing 13 kernels from Rodinia benchmark suite 
for heterogeneous computing, 4. 20 benchmarks from PolyBench with 43 kernels 
spanning linear algebra, data mining and stencil computations. 

We ran our experiments on Intel CPU (i5-6500) and GPU (HD Graphics 
530) using OpenCL SDK 2.0. 


6 Results and Analysis 


For each of the subject programs presented in Sect.5, we ran the associated 
test suites and report results in terms of coverage achieved, fault finding and 
overhead incurred with CLTestCheck framework. We executed the test suites 
20 times for each measurement. Our results in the context of the questions in 
Sect. 5 is presented below. 
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6.1 Coverage Achieved 


Branch and Loop coverage (with 0, exactly 1 and >1 iterations) for each of the 
subject programs in the three benchmark suites” is shown in the plots in Fig. 1. 
The first row shows branch coverage, the second loop coverage. Mutation score 
and surviving mutation types shown in the last two rows of Fig.1 is discussed 
in the next Sect. 6.2. 


Branch coverage for Parboil Branch coverage for Scan/Rodinia Branch coverage for PolyBench 
100} eo eee eeeeoee 100}- . ° e . 100} -00-00 
ee 
c.oo s . * 
g 80 2 80 a & 80 
E a E . 
S S S 
3 60 è 60 3 60 
= 2 = eeccecccccece o 
U $ U 
= 40 £ 40 E 40 
4 [a A 
5 3 E o 
20 20 20 
EEE ToT TAIATA o EEIT ETSTSTETSTETETETSTETN] 
OPNWRUONOODO e n ù p u o OPNWRUON OLO 
Benchmark S# Benchmark S# Benchmark S# 
Loop coverage for Parboil Loop coverage for Scan/Rodinia Loop coverage for PolyBench 
100}: atoto teetot tet 100 + + + MUERE EEEE E EE E E E e a 
o 80 u 80 + o 80 
a = aD s z : 
5 a iteration z a Oiteration 5 a O iteration 
S 60 : © 60 S 60 : k 
3 è 1iteration 3 e 1iteration 3 e = 1literation 
S a erce j i 8 5 ! S 5 p 
2 40 è >l iterations 2 40 è  >literations 2 40 è >literations 
3 8 S . 
. . 
. 
20 a 20 a 20 
a 
oe a ore Oli CT Ee Pe EST iiid 
PNURUGVOORP EEE EERE EEN o a a a PNW BUOdO OPE EERE EEEN 
NOSGAUROES EN ù B&B ùu o OPNURUON OLO 
Benchmark S# Benchmark S# Benchmark S# 
Mutation score for Parboil Mutation score for Scan/Rodinia Mutation score for PolyBench 
100} eo eoooe 100 . 100} e ” 
ee ee ee e . e > 
etas e . . e ae . ° 
g i g, 50 s g% ccoo o Ta 
ec 00 2 60 ig; (60 ° 
S S 2 
m A 5 
5 40 Z 40 5 40 
= = = 
20 20 20 
T ETE o LE ETEEN 
SPNWSTAUHRES e n ù p WG OPPURUON OLO 
Benchmark S# Benchmark S# Benchmark S# 
Surviving mutations for Parboil by type Surviving mutations for Scan/Rodinia by type Surviving mutations for PolyBench by type 
3 a ry aa - 
50 50 sop * 
& 40 & 40 & 40 
g g 
5 30 5 30 Fi 30 
5 20 j 5 20 z 5 20 ° 
£ é £ 
. 
10 R n 10 k 10 i j ë 
0 e œ e 0 ao a a L 0 a a 
> çp o D > 5 @ y Ẹ 2 E] oD p 
-FEISE 22 22% & 8 2% 8 2% 8 8 
> v a = 8 3 o T e A x g k E Eg w = = o = 9 
3 o B @ B s 3 9 a G 3 s 3 S X G g 8 
2 3 3 2 3 3 2 a 2 
aos 2 a aS 2 AOR s 
Operator type Operator type Operator type 


Fig. 1. Coverage achieved - Branch and Loop, mutation score and percentage of sur- 
viving mutations by type for each subject program in the 3 benchmark suites. 


2 20 applications in Parboil counting different test suites separately, 6 in Scan/Rodinia, 
and 20 in PolyBench. 
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Barrier Coverage is not shown in the plots since for all, except one, applica- 
tions with barriers, the associated test suites achieved 100% barrier coverage. 
The only subject program with less than 100% barrier coverage was scan, which 
had 87.5% barrier coverage. The uncovered barrier is in a loop whose condition 
does not allow some threads to enter the loop, resulting in barrier divergence 
between threads. We find that less than 100% barrier coverage is a useful indi- 
cator of barrier divergence in code. 

Branch Coverage. For most subject programs in Parboil and Scan/Rodinia, 
test suites achieve high branch coverage (>83%). The histo benchmark is an 
outlier with a low branch coverage of 31.6%. Its kernel function, histo_main, 
contains 20 branches in a code block handling an exception condition (overflow). 
The test suite provided with histo does not raise the overflow exception, and 
as a result, these branches are never executed. We found uncovered branches in 
other applications, with >80% coverage, in Parboil and Scan/Rodinia to also 
result from exception handing code that is not exercised by the associated test 
data. 

Branch coverage achieved for 13 of the 20 applications in PolyBench is at 

50%. This is very low compared with other benchmark suites. Upon investigat- 
ing the kernel code, we found that all the uncovered branches reside within a 
condition check for out of range array index. Tests associated with a majority of 
the applications did not check out of range array index access, resulting in low 
branch coverage. 
Loop Coverage. Test suites for nearly all applications (with loops) execute 
loops more than once. Thus, coverage for >1 iterations is 100% for all but one 
of the applications, srad in Rodinia suite, that has 80%. The uncovered loop in 
srad is in an uncovered then-branch that checks exception conditions. We also 
checked if the boundary value in loop conditions is reached when >1 iterations 
is covered by test executions. We found pathfinder in Rodinia to be the only 
application to have full coverage for >1 iterations but not reach the boundary 
value. The unusual scenario in pathfinder is because one of the loops is exited 
using a break statement. 

We find that test suites for most applications are unable to achieve any loop 
coverage for 0 and exactly 1 iteration. The boundary condition for most loops 
is based on the size of the work-groups which is typically much greater than 
1. As a result, test suites have been unable skip the loop or execute it exactly 
once. The only exceptions were applications in the Parboil suite - bfs, cutcp, 
mri-gridding, spmv, and two applications in Rodinia - lud, srad, that have 
boundary values dependent on variables that maybe set to 0 or 1. 

Overhead. For each benchmark and associated test suite, we assessed over- 
head introduced by our approach. We compared time needed for executing the 
benchmark with instrumentation and additional data structures that we intro- 
duced for coverage measurement against the original unchanged benchmark. 
Overhead varied greatly across benchmarks and test suites. Overhead for Par- 
boil and Rodinia benchmarks was in the range of 2% to 118%. Overhead was 
lower for benchmarks that took longer to execute as the additional execution 
time from instrumentation is a smaller fraction of the overall time. Overhead for 
most programs in PolyBench ranges from 2% to 70%, which is similar to Parboil 
and Rodinia benchmarks. The overhead for lu, fdtd-2d and jacobi-2d-imper 
programs are >100%. The code for kernel computations in these benchmarks is 
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small with fast execution. Consequently, the relative increase in code size and 
execution time after instrumentation with CLTestCheck is high. 


6.2 Fault Finding 


Fault finding for the subject programs is assessed using the mutants we generate 
with the fault seeder, described in Sect.4. The mutation score, percentage of 
mutants killed, is used to estimate fault finding capability of test suites associated 
with the subject programs. Each test suite associated with a benchmark is run 
20 times to determine the killed mutants. A mutant is considered killed if the 
test suite generates different outputs on the mutant than the original program 
in all 20 repeated runs of the test suite. In addition to killed mutants, we also 
report results on “Undecided Mutants”, that refers to mutants that are killed in 
at least one of the executions of the test suite, but not all 20 repeated executions. 
Changes in GPU thread scheduling between runs causes this uncertainty. We do 
not count the undecided mutants towards killed mutants in the mutation score. 
Mutation score for all subject programs in each benchmark suite is shown in the 
third row of plots in Fig. 1. 
Mutation Score. In general, we find that test suites for subject programs 
achieving high branch, barrier and loop coverage also have high mutation score. 
For instance, for spmv and stencil, their test suites achieving 100% coverage, 
also achieved 100% mutation score. An instance of a program that does not 
follow this trend is mri-gridding that has 100% branch, barrier, and loop (>1 
iterations) coverage but only 82% mutation score. On analysing the survived 
mutants, we found a significant fraction (160 out of 232) were arithmetic operator 
mutations within a function named kernel_value that contained variables defining 
a fourteenth-order polynomial and a cubic polynomial. Effect of mutations on 
the polynomials did not propagate to the output of the benchmark with the 
given test suite. The histo program with low branch coverage, 100% barrier 
and loop coverage has 65.9% mutation score. Nearly two thirds of the branches 
in histo cannot be reached by the input data, as a result, all the mutations in 
the untouched branches is not killed, resulting in a low mutation score. A few of 
the programs in PolyBench have mutation scores that are between 60-70%. In 
these programs, most surviving mutations are arithmetic operator mutations. 
As seen in the last row of Fig. 1 showing surviving mutations by operator 
type, arithmetic operators are the dominant surviving mutations in all three 
benchmark suites. Control flow adequate tests can kill arithmetic operator muta- 
tions only if they propagate to a control condition or the output. Data flow 
coverage may be better suited for estimating these mutations. Around 20% of 
relational operator mutations also survive in our evaluation. Most of the surviv- 
ing relational operator mutations made slight changes to operators, such as < to 
<=, or > to >= and vice versa. The test suites provided with the benchmarks 
missed such boundary mutations. 
Undecided mutants occur during executions of 9, out of the 46 subject pro- 
grams and test suites across all three benchmark suites. Number of undecided 
mutants during the 9 executions is generally small (<= 5). The only excep- 
tion is tpacf in the Parboil benchmark suite, that resulted in 18 undecided 
mutants when executing one of its test suite. Undecided mutants point to non- 
deterministic behaviour in the kernel, that is dependent on GPU thread execu- 
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tion model. A large number of undecided mutants is alarming and developers 
should examine kernel code more closely to ensure that the behaviour observed 
is as intended. 

Barriers were not used in all benchmarks. Only 5 out of the 9 benchmarks in 
Parboil, and 4 of the 6 in Scan/Rodinia had barriers. PolyBench programs did 
not use any barriers. Mutations removed barrier function calls in these bench- 
marks and we ecorded the number of mutants killed by test suites. Percentage 
of killed barrier mutations is generally low across all benchmarks with barriers. 
For instance, removing 2 out of 3 barriers in the histo program in Parboil, 
and removing all barriers in the cutcp program had no effect on outputs of the 
respective program executions. This may either mean that the test suites are 
inadequate with respect to the barrier mutations or it could be an indication 
that these barriers are superfluous with respect to program outputs, and the 
need for synchronisation should be further justified. For the programs in our 
experiment, we found barriers, whose mutations survived, to be unnecessary. 
Coverage versus Mutation Score. The plots in Fig. 1 illustrate total muta- 
tion score over all types of mutations for each subject program and test suite. 
We also compute mutation scores specifically for branches, barriers, and loops 
using mutations relevant to them. We do this to compare against branch, bar- 
rier and loop coverage achieved for each of the subject programs. We found 
that mutation score for branches closely follows branch coverage for most sub- 
ject programs. Outliers include adi, nn, convolution-2d and convolution-3d. 
Mutations that change < to <= are not killed in these kernels; these comprise 
one third of all branch mutations. 

Mutation score for barriers is quite different from barrier coverage. This is 
because test suites are able to execute the barriers and achieve coverage. How- 
ever, they are unable to produce different outputs when the barriers are removed. 
This may be a problem with the superfluous manner in which barriers are used 
in these programs. 

Loop coverage with >1 iterations is 100% for all but one subject program 
(srad in Rodinia). Mutation score for loops on the other hand is variable. In 
general, tests achieving loop coverage are unable to reveal loop boundary muta- 
tions. Histo and srad are worth noting with high loop coverage but low loop 
mutation scores. We find that mutations to the loop boundary value in these 
two benchmarks survive, which implies that access to loop indices outside the 
boundary go unchecked in these programs. These unsafe values of loop indices 
should be disallowed in these kernels and loop boundary mutations in our fault 
seeder help reveal them. 


6.3 Schedule Amplification: Deadlocks and Data Races 


Kernel Deadlocks: When we used the CLTestCheck schedule amplifier on 
our benchmarks, we found kernel executions deadlock when the work-group ID 
selected to go first exceeds the number of available compute units. As there are 
no guarantees on how work-groups are mapped to compute units, we allow work- 
group IDs exceeding number of compute units to go first in some test executions 
using our schedule amplifier. However, it appears that the GPU makes unstated 
assumptions on what work-group IDs are allowed to go first. As noted by Soren- 
son et al. [22], “execution of large number of work-groups is in any occupancy 
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bound fashion, by delaying the scheduling of some work-groups until others have 
executed to completion”. They observed deadlocks in kernel execution due to 
inter work-group barriers. However, in the benchmarks in our evaluation, there 
is no explicit inter work-group barrier. It may be the case that developers made 
implicit assumptions on inter work-group barriers using the occupancy bound 
model and our schedule amplification approach violates this assumption. Nev- 
ertheless, our finding exposes the need for an inter work-group execution model 
that explicitly states the details and assumptions related to mapping of work- 
groups to compute units for a given kernel on a given GPU platform. 

Inter Work-group Data Races: We were able to reveal a data race in the 
spmv application from the Parboil benchmark suite. We found that when work- 
groups 0 or 1 are chosen to go first in our schedules, the kernels execution always 
produces the same result. However, when we pick other work-group ids to go first, 
the test output is not consistent. Among twenty executions for each schedule, 
the frequency of producing correct output varies from 45% to 70%. 

We observe similar behaviour in the tpacf application in Parboil when we 
delete the last barrier function call in the kernel. The kernel execution produces 
consistent outputs when we pick work-group 0 or 1 to go first. When we pick 
other work-groups to go first using our schedule amplifier, the kernel execution 
results are non-deterministic. 

We observe no unusual behaviour in any of the PolyBench programs. These 
programs split the computation into multiple kernels and the CPU program 
launches GPU kernels one by one. The transfer of control from the GPU to the 
CPU between kernels acts like a barrier as the CPU will wait until a kernel 
finishes before launching the next kernel. In addition, care has been taken in 
the kernel code to ensure threads do not access the same memory location. As 
a result, we observe no data races in PolyBench with our schedule amplifier. 


7 Conclusion 


We have presented the CLTestCheck framework for measuring test effectiveness 
over OpenCL kernels with capabilities to measure code coverage, fault seeding 
and mutation score measurement, and finally amplify the execution of a test 
input with multiple work-group schedules to check inter work-group interactions. 
Our empirical evaluation of CLTestCheck capabilities with 82 publicly available 
kernels revealed the following, 


1. The schedule amplifier was able to detect deadlocks and inter work-group 
data races in Parboil benchmarks when higher work-group ids were forced to 
execute first. This finding emphasizes the need for transparency and clearly 
stated assumptions on how work-groups are mapped to compute units. 

2. Barrier coverage served as a useful measure in identifying barrier divergence 
in benchmarks (scan). 

3. Branch coverage pointed to inadequacies in existing test suites and found test 
inputs for exercising error handling code were missing. 

4. Across all benchmark suites, we found arithmetic operator and relational 
operator mutations that changed < to <=, > to >= or vice versa were hard 
to kill. More rigorous test suites to handle these mutations are needed. 
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In 


The use of barrier mutations revealed several instances of unnecessary barrier 
use. Barrier usage and its implications is not well understood by developers. 
Barrier mutations can help reveal incorrect barrier uses. 

Loop boundary mutations helped reveal unsafe accesses to loop indices out- 
side the loop boundary. 


sum, the CLTestCheck framework is an automated, effective and useful tool 


that will help developers assess how well OpenCL kernels have been tested, 
kernel regions that require further testing, uncover bugs with respect to work- 
group schedules. In the future, we plan to add further metrics, like data flow 
coverage with work-group schedule, to strengthen test adequacy measurement. 
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Abstract. This paper describes the development of a parallel simulator 
of a multicore memory system from a model formalized as a structural 
operational semantics (SOS). Our implementation uses the Abstract 
Behavioral Specification (ABS) language, an executable, active object 
modelling language with a formal semantics, targeting distributed sys- 
tems. We develop general design patterns in ABS for implementing SOS, 
and describe their application to the SOS model of multicore memory 
systems. We show how these patterns allow a formal correctness proof 
that the implementation simulates the formal operational model and dis- 
cuss further parallelization and fairness of the simulator. 


1 Introduction 


Structural operational semantics (SOS) [1], introduced by Plotkin in 1981, 
describes system behavior as transition relations in a syntax-oriented, compo- 
sitional way, using inference rules for local transitions and their composition. 
Process synchronization in SOS rules is expressed abstractly using, e.g., asser- 
tions over system states and reachability conditions over transition relations as 
premises, and label synchronization for parallel transitions. This high level of 
abstraction greatly simplifies the verification of system properties, but not the 
simulation of system behavior as execution quickly becomes a reachability prob- 
lem with a lot of backtracking. In this paper, we study how to implement a 
parallel simulator with a formal correctness proof from a SOS model, in terms 
of a case study of a multicore memory system. Such a correctness proof requires 
that the implementation language is also defined formally by an operational 
semantics. 
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A major challenge in software engineering is the exploitation of the computa- 
tional power of multicore (and manycore) architectures. One important aspect of 
this challenge is the memory systems of these architectures. These memory sys- 
tems generally use caches to avoid bottlenecks in data access from main memory, 
but caches introduce data duplication and require protocols to ensure coherence. 
Although data duplication is usually not visible to the programmer, the way a 
program interacts with these copies largely affects performance by moving data 
around to maintain coherence. To develop, test and optimize software for multi- 
core architectures, we need correct, executable models of the underlying memory 
systems. A SOS model of multicore memory systems with correctness proofs for 
cache coherency has been described in [2], together with a prototype imple- 
mentation in the rewriting logic system Maude [3]. However, this fairly direct 
implementation of the SOS model is not well suited to simulate large systems. 

This paper considers an implementation of the SOS model in ABS [4], a lan- 
guage tailored to the description of distributed systems based on active objects 
[5]. ABS is formally defined by an operational semantics and supports parallel 
execution on backends in Erlang, Haskell, and Java. The following features of 
ABS allow a high-level, coarse-grained view of the execution of different method 
invocations by different active objects: encapsulation of local state in active 
objects, communication using asynchronous method calls and futures, and coop- 
erative scheduling of the method invocations of an active object. Our case study 
fully exploits these features and the resulting abstractions to correctly implement 
the complex process synchronization of the original SOS model. 

The main contributions of this paper are as follows: 


— We provide general design patterns in ABS for implementing structural oper- 
ational semantics with active objects, and apply these patterns to the imple- 
mentation in ABS of a structural operational semantics of multicore memory 
systems. 

— We show how these patterns allow a formal correctness proof of this imple- 
mentation by means of a simulation relation between the formal operational 
semantics of the ABS implementation and the operational model of multicore 
memory systems. 

— We discuss how these ABS design patterns can be used to further parallelize 
the implementation while preserving correctness. 

— Finally, we show how the ABS modeling concepts of symbolic time and vir- 
tual resources can be used to obtain a parallel implementation of the SOS 
model which abstractly ensures fairness between the progress of different par- 
allel components, independently of the number of cores that are used in the 
simulation. 


2 An Abstract Model of a Multicore Memory System 


Design decisions for a program running on top of a multicore memory systems 
can be explored using simulators based on abstract models. Bijo et al. [2,6] 
developed a model which takes as input tasks (expressed as data access) to 
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be executed, the corresponding data layout in main memory (indicating where 
data is allocated), and a parallel architecture consisting of cores with private 
multi-level caches and shared memory (see Fig. 1). Additionally, the model is 
configurable in the number of cores, the number and size of caches, and the 
associativity and replacement policy. Memory is organized in blocks which move 
between caches and main memory. For simplicity, the model assumes that the 
size of cache lines and memory blocks in main memory coincide, abstracts from 
the data content of memory blocks, and transfers memory blocks from the caches 
of one core to the caches of another core via main memory. 


Tasks from the pro- Tasks waiting to q a 
gram are scheduled for be: scheduled 
execution from a shared Task 
task pool. Task execution Heere) a) 
on a core requires mem- Cache 
ory blocks to be trans- fetch/flush 
ferred from main mem- “ ane T 
ory to the closest cache. 
Each cache has a pool mo 
of fetch/flush instructions Abstract communication medium 
to move blocks among Hain mhenaiy 
caches and between caches 
and main memory. Con- 


sistency between multiple 
copies of a memory block Fig. 1. Abstract model of a multicore memory system. 
is ensured using the stan- 
dard cache coherence protocol MSI (e.g., [7]), with which a cache line is either 
modified, shared or invalid. A modified cache line has the most recent value of 
the memory block, therefore all other copies are invalid (including the one in 
main memory). A shared cache line indicates that all copies of the block are con- 
sistent. The protocol’s messages are broadcast to the cores. The details of the 
broadcast (e.g., on a mesh or a ring) can be abstracted into an abstract commu- 
nication medium. Following standard nomenclature, Rd messages request read 
access and RdX messages read exclusive access to a memory block. The latter 
invalidates other copies of the same block in other caches to provide write access. 
To access data from a block n, a core looks for n in its local caches. If n is not 
found in shared or modified state, a read request !Rd(n) is broadcast to the other 
cores and to main memory. The cache can fetch the block when it is available in 
main memory. Eviction is required if the cache is full. Writing to block n requires 
n to be in shared or modified state in the local cache; if it is in shared state, an 
invalidation request !RdX (n) is broadcast to obtain exclusive access. If a cache 
with block n in modified state receives a read request ?Rd(n), it flushes the block 
to main memory; if a cache with block n in shared state receives an invalidation 
request ?RdX(n), the cache line will be invalidated; the requests are discarded 
otherwise. Read and invalidation requests are broadcast instantaneously in the 
abstract model, reflecting that signalling on the communication medium is order 
of magnitude faster than moving data to or from main memory. 
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Syntactic 

categories. Definitions. 

cid € Coreld cf € Config ::= M o dap o Ca o CR 
caid € Cacheld CRE Core = cid è rst 

n € Address Ca€ Cache ::= caid eM e dst 

r € Ref st € Status ::= {mo, sh, inv} 


dap € AccessPtns ::= € | dap;dap | read(r) | write(r) | commit (r) 
| commit | dapMdap | dap* | skip | spawn(dap) 
rst€ RunLang ::= dap | rst;rst | readB1(r) | writeB1(r) 
dst € DataLang ::= € | dst | fetch(n) | flush(n) | fetchB1(n) | flush 


Fig. 2. Syntax of runtime configurations, where over-bar denotes sets (e.g., CR). 


2.1 Formalization of the Multicore Memory System as an SOS 
Model 


An operational meaning for the abstract model described above has be defined 
using structural operational semantics (SOS) [1] with labeled transitions to 
model broadcast in the abstract communication medium. The resulting formal- 
ization [2,6] is shown to guarantee standard correctness properties for data con- 
sistency and cache coherence from the literature [8,9], including the preservation 
of program order in each core, the absence of data races, and no access to stale 
data. We briefly outline the main aspects of the formal model. The runtime syn- 
tax is given in Fig.2. A configuration cf consists of main memory M, cores CR, 
caches Ca, and tasks dap to be scheduled. (We syntactically abuse set opera- 
tions for multisets, including union U and subtraction \.) A core cide rst with 
identifier cid executes runtime statements rst. A cache with identifier caid has a 
local cache memory M and data instructions dst. We assume that caid encodes 
the cid of the core to which the cache belongs and its level in the cache hierarchy. 
We denote by Status U {L} the extension of the set of status tags with the unde- 
fined value L. Thus, a memory M : Address — Status U {} maps addresses n 
to either a status tags Status or to L if the memory block with address n is not 
found in M. 

Data access patterns dap model tasks consisting of read(r) and write(r) 
operations to references r and control flow operations for sequential composition 
dap,; dapo, non-deterministic choice dap, N daps, repetition dap*, task creation 
spawn(dap), and commit which flushes the entire cache after task execution. 
The empty access pattern is denoted £. Cores execute runtime statements rst, 
which extend dap with readBl1(r) and writeB1(r) to block execution while 
waiting for data. Caches execute data instructions dst to fetch and flush the 
memory block with address n, here fetchB1(n) blocks execution while waiting 
for data, and flush flushes the entire cache. 

The abstract communication medium allows messages from one cache to be 
transmitted to the other caches and to main memory in a parallel instantaneous 
broadcast. Communication in the abstract communication medium is formalized 
in terms of label matching on transitions. The formal syntax for this label mech- 
anism is as follows: 
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S ::=!Rd (n) |!RdX (n) R ::=? Rd (n) |? RdX (n) 


Here, for any address n, a request of the form !Rd(n) or !RdX(n) is sent by 
one node and its dual of the form dual(!Rd(n)) =? Rd (n) or dual(!RdX(n)) = 
?RdX (n) is broadcast to the rest of nodes and main memory. The syntax of the 
model is further detailed in [2,6]. 


2.2 Local and Global SOS Rules 


The semantics is divided into local and global rules. Local rules capture inter- 
action inside a node containing a core and the hierarchy of caches. Global rules 
capture synchronization and coordination between different nodes and main 
memory. In an initial configuration cfg, all blocks in main memory M have 
status sh, all cores are idle, all caches are empty, and the task pool in dap has 
a single task representing the main block of a program. Let cf = cf’ denote an 
execution starting from cf and reaching cf’ by applying global transition rules, 
which in turn apply local transition rules for each core and its cache hierarchy. 
In the rules, let the auxiliary function addr(r) return the address n of the block 
containing reference r, cid(caid) the identity of the core associated with cache 
caid, lid(caid) the cache level of caid, and status(M,n) the status of block n 
in map M. Let the predicate first(caid) hold when caid is the first level and 
last(caid) when caid is the last level cache. Note that unlabelled transitions —> 


can be executed asynchronously, while labelled transitions 2, require synchro- 
nization between all the nodes and main memory (see Figs. 3 and 4). We discuss 
some representative rules for local and global level of the SOS model. The full 
SOS formalization can be found in [6]. 

Local semantics. The first rules of Fig. 3 involve a core and its first level 
cache. In PRRDj, reading reference r succeeds if the block containing r is avail- 
able. Otherwise, in PRRD» a fetch(n) instruction is added to the data instruc- 
tions dst of the first level cache and further execution of the core is blocked by 
readB1(r). Writing to r only succeeds if the associated memory block has mo 
status in the first level cache. If the cache line is shared, the core broadcasts a 
!RdX (n) request to acquire exclusive access, where the broadcast appears as a 
label on the transition in PRWR». Otherwise, the block must be fetched from 
main memory in PRWR and writeB1(r) blocks execution. 

For the remaining rules of Fig.3, LC-H1T; and LC-MiIss; capture interac- 
tions between adjacent levels of caches, and LCC-Miss, local state change in 
a cache line. If cache caid; needs a block n that is sh or mo in the next level 
cache, the address where block n should be placed is decided by a function 
select(M;,n) which reflects the cache associativity and the replacement policy. 
If eviction is needed, block n in caid; will be swapped with the selected block 
in caid; in LC-Hi1tT;. LC-MIss; shows how fetch(n)-instructions propagate to 
lower cache levels: fetch(n) is replaced by fetchB1(n) in caid; and added to 
the data instructions in caid,;. If the block cannot be found in any local cache, 
we have a cache miss: Execution is blocked by fetchB1(n) and a read request 
!Rd(n) is broadcast, represented by the label in LLC-Miss}. 
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(PRRD) (PRRD2) 
n=adadr(r) first(caid) = true pe addr(r) _ first(caid) = trug 
cid(caid)=c¢  status(M,n) € {sh,mo} cid(caid)=c status(M,n) € {inv, L} 
(caide M edst)o (ce read(r);rst ) > (caide Medst )o(ce read(r);rst ) > 
(caide M edst)o(ce rst ) (caide M[nr+ L] edstU{fetch(n)} )o(ce readB1(r);rst ) 
(PRWR2) (PRWR3) 
n=addr(r) _ first(caid) = true n=adadr(r) _ first(caid) = true 
cid(caid) =c  status(M,n) = sh cid(caid) =c__ status(M,n) € {inv, L} 
(caide M edst)o(ce write(r);rst j ea (caide Medst )o(ce write(r);rst ) > 
(caide M|n=mo] edst)o(ce rst ) (caide Mn+ LledstU{fetch(n)} )o(ce writeBl(r);rst ) 
(LC-HIT,) 


status(Mj,n;) =s; status(Mj,n)=s; sj € {sh,mo} 
lid(caid;) = lid(caid;)+1  cid(caid;) = cid(caid;) select(M;,n) = ni 


(caid;e M; edst;U{fetch(n)}) 0 (caidje M; edstj) > 


(caid; ¢ M;[nj ++ L,n> sj] dst;)0(caidje M;[n > L,ni ++ si] ¢dst;) 


(LC-MISs}) 
lid(caid;) = lid(caid;)+1  cid(caid;) = cid(caid;) status(Mj,n) € {inv, L} 


(caid; eM; dst; U{fetch(n)} )o(caidje Mjedst; ) > 


(caid; eM; ¢ dst; U{£etchB1(n)} )o(caidje M;[n> L]edst; U{fetch(n)} ) 


(LLC-MIss;) 
last(caid) = true  status(M,n) € {inv, L} 


(caideMe dstU{fetch(n)} ) 2 (caideM[n 1] dstU{fetchB1(n)} ) 


Fig. 3. Local transition rules. 


(SYNCH2) 
(SYNCH) CR={CR\}WCR,  Ca=Ca; Y Caz 
S#O R=dual(S) belongs(Ca,,{CR,}) belongs(Caz,CR2) R = dual(S) 
MË M! CaoCR $ Cd oCR’ Ta oCR; + Cal oCR, Ca *+ Ca, 


CR’ = {CRI }U CR: Ca! = Cal U Ca, 


M o dap o Ca o CR — M' o dap o Ca' o CR' 


Cao CR $ Cal o CR' 
(ASYNCH) 

CR=CR,WCRiWCR3 Ca = Ca; W Caz W Caz W Cag belongs(Caz,CR3) 
MoCa; >M'oCa, Car—> Ca, dapoCR -> dap oCR, CazoCR; > Ca, o CR, 
CR =CR, UCR, UCR, Ca = Ca, U Ca, U Cas U Cas 
M o dapo Cao CR —> M' o dap' o Ca o CR' 


Fig. 4. Global transition rules. 


Global semantics. The global rules synchronize the cache hierarchies of dif- 
ferent cores and main memory, and ensures coherence. Selected global rules are 
given in Fig. 4. Rule SYNCH, captures a global step with synchronization on a 
label S, which can be either !Rd (n) or !RdX (n). The request will be broadcast to 
other caches. To maintain data consistency, these caches must process the requests 
at the same time. The receiving label R is the dual of S. For synchronization, the 
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transition is decomposed into a premise for main memory with label R and another 
premise for the caches with label S. Rule SYNCHg2 distributes the receiving label 
to caches Caz, which do not belong to the cache hierarchy of the sender core C R1. 
The predicate belongs(Ca, CR) expresses that any cache in Ca belongs to exactly 
one core in CR. Rule ASYNCH captures parallel transitions without label. These 
transitions can be local to individual nodes and caches, parallel memory accesses, 
or the parallel spawning and scheduling of new tasks. 


3 The ABS Model of the Multicore Memory System 


In this section we outline the translation of the formal model into an exe- 
cutable object-oriented model using the ABS modeling language. We first briefly 
introduce the language and later explain the structural and behavioural corre- 
spondence between these two models, with a focus on the main challenges. 


3.1 The ABS Language 


ABS is a modeling language for designing, verifying, and executing concurrent 
software [4]. The language combines the syntax and object-oriented style of Java 
with the Actor model of concurrency [10] into active objects which decouple 
communication and synchronization using asynchronous method calls, futures 
and cooperative scheduling [5]. Although only one thread of control can execute 
in an active object at any time, cooperative scheduling allows different threads 
to interleave at explicitly declared points in the code. Access to an object’s 
fields is encapsulated, so any non-local (outside of the object) read or write to 
fields must happen explicitly via asynchronous method calls so as to mitigate 
race-conditions or the need for mutual exclusion (locks). 

We explain the basic 
mechanism of asynchronous inta e e 
method calls and coopera- Unit lock _bus{await unlocked; unlocked = False; } 
tive scheduling in ABS by Unit release_bus{unlocked = True;} } 
the simple code example 
of a class Bus. First, the 
execution of a statement 
res = await o!m/(args) con- 
sists of storing a message m(args) corresponding to the asynchronous call to the 
message pool of the callee object o. This await statement releases the control 
of the caller until the return value of that method has been received. Releas- 
ing the control means that the caller can execute other messages from its own 
message pool in the meantime. ABS supports the shorthand o.m(args) to make 
an asynchronous call f=o!m(args) followed by the operation f.get which blocks 
the caller object (does not release control) until the future f has received the 
return value from the call. As a special case the statement this.m(args) models a 
self-call, which corresponds to a standard subroutine call and avoids this block- 
ing mechanism. The code in Fig.5 illustrates the use of the await statement 


class Bus { 


Fig. 5. Bus lock implementation in ABS using await on 
Booleans. 
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» | IScheduler sched 
ICache l1 
SstList currentTask 


1 


RRScheduler() 1 
List<SstList> q = Nil Cache 
SstList getTask() IBus bus } 
Unit putTask(SstList newTask) IMemory mainMemory 


Maybe<ICache> nextLevel 
MemMap cacheMemory 

Unit read(Reference r) 

Unit write(Reference r) 

Unit commit(Reference r) 

Unit commitAll() 

Unit fetch(Address a) 

4..* | Unit flush(Address a) 

Unit flushAll() 

Maybe<Status> swap(Address a_out, Maybe<CacheLine> m_in) 

Unit fetchFromMain(Address a, ICache sender) 

Unit receiveRd(Address a, |Barrier start, [Barrier end, Cache sender) 

Unit receiveRdX(Addres¢a, |Barrier start, IBarrier end, |Cache sender) 


1” 


Barrier 
Int nbrOfCaches 


|Memory mainMemory 
Bool unlocked 
List<ICache> caches 
Unit lock_bus() 

Unit release_bus() 
Unit sendRd(Address b, ICache sender) 
Unit sendRdxX(Address b, ICache sender) 


Status fetchM(Address b) 
Unit flushM(Address b) 
Unit receiveRdXM(Address a) 


Fig. 6. Class diagram of the ABS model. 


on a Boolean condition to model a binary semaphore, which is used to enforce 
exclusive access to a communication medium implemented as a “bus”. Thus, the 
statement await bus!lock_bus() will suspend the calling method invocation (and 
release control in the caller object) and will be resumed when the generated 
invocation of the method lock_bus of the “bus” itself has been resumed when the 
local condition unlocked (of the “bus”) has become true. 


3.2 The Structural View 


The runtime syntax of the SOS is represented by ABS classes, as outlined in 
Fig. 6. We briefly overview the translation. In ABS, object identifiers guarantee 
unique names and object references are used to capture how cores and caches 
are related. These references are encoded in a one-to-one correspondence with 
the naming scheme of the SOS. 

A core cide rst is translated into a class Core with a field currentTask repre- 
senting the current task rst. Each core holds a reference to the first level cache. 
A cache memory caide M e dst is translated into a class Cache with an interface 
ICache and a class parameter nextLevel. In a cache, nextLevel holds a reference 
to the next level cache. If this reference is Nothing, it is last level cache (in the 
SOS, a predicate last is used to identify the last level). The field cacheMemory 
models the cache’s memory M in SOS. The process pool of each cache object in 
ABS represents the data instruction set dst. 

An ABS configuration consists of a number of cores with their corresponding 
cache hierarchies, the main memory, a scheduler with tasks waiting to be sched- 
uled, and the ABS classes Bus and Barrier, which model the abstract communi- 
cation medium and the global synchronization with labels !Rd(n) and !RdX (n) 
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s: Scheduler 


I1c1: [Cache 


a 


l1cm: ICache 


Fig. 7. Object diagram of an initial configuration. 


in the SOS. The object diagram in Fig. 7 
sponding to the one depicted in Fig. 1. 


3.3 The Behavioral View 


shows an initial configuration corre- 


We discuss in this section the design patterns in ABS that implement the syn- 
chronization inherent in the SOS model. We observe here that the combination 
of asynchronous method calls and cooperative scheduling is crucial because of 
the multitasking inherent in the SOS model, which requires that objects need to 
be able to process other requests; e.g., caches need to flush memory blocks while 


waiting for a fetch to succeed. 


Local synchronization in the SOS model 
between two structural entities (e.g., two 
caches in rule LC-HIT; of Fig. 3), is imple- 
mented by the following synchronization 
pattern in ABS (see Fig.8). Given two 
objects 0; and og, let o} execute method 
mı, which checks the local conditions of 01 
(highlighted as region A in Fig. 8). If these 
local conditions hold, method mz on 02 is 
called asynchronously. Method mz com- 
pletes when the local conditions of oz hold 
(highlighted as region B in Fig.8). How- 
ever, when mz has returned and object 
0, again schedules method mı, the con- 
ditions on object og need no longer hold. 
Therefore, 0; next calls the method m3 
synchronously to check these conditions 


o1: Cı o2 : C2 
T T 
i 
i 
eg 
I 
i 


process L- 
finishes pa, me B 


mi 
execution LI 
1 
l 
f 
process is 
suspended m3 C 


Fig. 8. Local synchronization between 
two ABS objects. 


again. If these condition still hold, method m3 returns successfully (in general, 
having updated 02), and we can proceed to do the local changes in o1 (highlighted 
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Just(nextCache) => { 
Maybe<Status> s = Nothing; 
Maybe<CacheLine> selected = Nothing; 
while (s == Nothing) { 
retValue = await nextCachelfetch(n); 
selected = select(cacheMemory, maxSize, n); 
s = nextCache.swap(n,selected,name); 


case selected { 
Nothing => skip; 
Just(Pair(nl, _)) => cacheMemory = removeKey(cacheMemory,n1); 


cacheMemory = put(cacheMemory, n, fromJust(s)); 


Fig. 9. Extract of ABS method fetch. When this code is reached, the requested cache 
line n has status invalid or it is not in the cache. The function select chooses a cache line 
to be swapped with n. If there is still free space in the cache, select returns Nothing. If n 
has either shared or modified status in the next level cache, the method swap removes 
the cache line with address n, inserts the selected cache line and returns the current 
status of n; otherwise, swap simply returns Nothing. 


as region C in Fig.8). Otherwise, the process needs to be repeated until we 
succeed. Note that method m3 should not contain release points; because this 
method is called synchronously from a different object, a release point will in 
general have the potential of introducing deadlocks in the caller object. 

To illustrate the above protocol, consider the code snippet in Fig.9, which 
corresponds to part of several rules in the SOS (in particular, rule LC-HIT1). 
Here, the current object this corresponds to caid; in the SOS, running method 
fetch, and the referenced object in nextCache corresponds to caid;. When fetch 
from nextCache returns, all the required conditions in nextCache are True. How- 
ever, since the call is asynchronous, (some of) the conditions may no longer hold 
when execution continues in this. This is addressed by checking the return value 
of method swap: If swap returns an address, it means the conditions still hold and 
the necessary updates are performed both locally and in nextCache; otherwise 
(when swap returns Nothing) fetch will be called again. 


Global synchronization in the SOS (see Fig. 10a) is modelled by matching 
labelled transitions. To simulate this instantaneous communication in ABS, we 
introduced the classes Bus and Barrier. The synchronization protocol is activated 
by asynchronous calls to the respective methods sendRd and sendRdX of the bus. 
The bus subsequently asynchronously calls the corresponding methods receiveRd 
and receiveRdX of the caches. Two barriers start and end are used by the caches 
to synchronize the start, as well as the completion, of the local executions of 
methods receiveRd and receiveRdX. 

However, observe that objects in ABS are input enabled: it is always pos- 
sible to call a method on an object. In our model, this scheme may give rise 
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EA 
barrier 
RS À 


lock bus() s d 
N < 
=z receiveRd().’_\™ 
Capi oa ag rA —|| (>) 
1 bus fo Nee” 
unlock bus() wx 


5a’ 
end 
barrier 


(a) State machine of the global synchro- (b) State machine of the global synchronization 
nization using labels in the SOS model. using a bus and barriers in the ABS model. 


Fig. 10. Synchronization in SOS vs ABS. In the SOS model (a), circles represent 
nodes in the memory system and shaded arrows labelled transitions. Note that the bus 
is implicit in the SOS model, as synchronization is captured by label matching. In the 
ABS model (b), circles represent the same nodes as in the SOS model, shaded arrows 
method invocations, solid arrows mutual access to the bus object and dotted arrows 
barrier synchronizations. 


to inconsistent states: the local status of a memory location which triggers an 
asynchronous call of one of the methods sendRd and sendRdX of the bus may 
be invalidated by other bus synchronizations. Therefore, we add a lock to the 
bus (see Figs. 5 and 6), which is used to ensure exclusive access to the message 
pool of the bus when one of the methods read, write, and fetch are executed. The 
lock is released in case bus synchronization is not needed. The overall scheme is 
depicted in Fig. 10b. The exclusive access to the message pool of the bus guar- 
antees that the message pool of the bus contains at most one call to one of 
the methods sendRd and sendRdX. Consequently, the triggering condition of the 
call cannot be invalidated before the call has been executed. This strict locking 
strategy, however, decreases concurrency in the distributed system, but reduces 
the complexity of the proof of equivalence between the SOS and the distributed 
implementation. We discuss how to further enhance the parallelization in Sect. 5. 


4 Correctness 


In this section we discuss the correctness of the ABS model by means of a 
simulation relation between the transition system describing the semantics of the 
ABS model of the multicore memory system and the transition system described 
by the SOS model. 

The semantics of an ABS model can be described by a transition relation 
between global configurations. A global configuration is a (finite) set of object 
configurations. An object configuration is a tuple of the form (oid, o, p, Q), where 
oid denotes the unique identity of the object, o assigns values to the instance 
variables (fields) of the object, p denotes the currently executing process, and Q 
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denotes a set of (suspended) processes. A process is a closure (7,5) consisting 
of an assignment T of values to the local variables of the statement S. 

We refer to [4] for the details of the structural operational semantics for 
deriving transitions G — G” between global configurations in ABS. Since in ABS 
concurrent objects only interact via asynchronous method calls and processes are 
scheduled non-deterministically (which provides an abstraction from the order in 
which the processes are generated by method calls), the ABS semantics satisfies 
the following global confluence property that allows to commute consecutive 
computations steps of independent processes which belong to different objects. 
Two processes are independent if neither one is generated by the other by an 
asynchronous call. 


Lemma 1 (Global confluence). For any two transitions G —> Gi and G > 
G2 that describe execution steps of independent processes of different objects, 
there exists a global configuration G” such that Gi > G' and Gp > C”. 


An object configuration is stable if the statement S to be executed has termi- 
nated or starts either with a get operation on a future or with an await statement 
on a Boolean condition or a future. A global ABS configuration is stable if all its 
object configurations are stable. Observe that our ABS model does not give rise 
to local divergent computations without passing through stable configurations; 
i.e., every local computation eventually enters a stable configuration. Together 
with the global confluence property in Lemma 1, this allows to restrict the seman- 
tics of the ABS model in the simulation relation to stable global configurations; 
i.e., transitions G = G” between stable global configurations G and G” which 
result from a (non-empty) sequence of local execution steps of a single process 
from one stable configuration to a next one. 

Because of the global synchronization with the bus in ABS described above, 
we may also represent without loss of generality the synchronization on the bus 
by a single global transition G = G” which involves a completed execution of 
the method sendRd(...) (or sendRdX(...)) by the bus. This is justified because 
the global confluence allows for a scheduling policy such that the execution of 
the processes that are generated by these methods, i.e., the calls of the methods 
receiveRd(...) (or receiveRd(...)) are not interleaved with any other processes. 


The simulation relation. The structural correspondence between a global con- 
figuration of the ABS model and a configuration of the SOS model is described 
in Sect.3.2. For each method we have constructed a table which, among oth- 
ers, associates with some, so-called observable, occurrences of await statements 
(appearing in the method body) a corresponding dst instruction. In general, the 
execution of the remaining (occurrences of) await statements, for which there 
does not exist a corresponding dst instruction, involves some asynchronous mes- 
saging preparing for the corresponding synchronous exchange of information in 
the SOS model. In some cases, the execution of these unobservable statements 
(e.g., the read and write methods) also does not correspond to a change of the 
SOS configuration. Let a map every stable global configuration G of the ABS 
model to a structurally equivalent configuration a(G) of the SOS model, which 
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additionally maps every observable process (either queued or active) to the asso- 
ciated dst instruction (a process is observable if its corresponding statement is 
observable). 

We arrive at the following theorem which expresses that the ABS model is a 
correct implementation of the abstract model. 


Theorem 1. Let G be a stable global configuration of the ABS model. If G => G' 
then a(G) >* a(G’), where >* denotes the reflexive, transitive closure of >. 


Proof. The proof proceeds by a case analysis of the given transition G > G”, 
which, as discussed above, involves the local execution of some basic sequential 
code by a single object. For example, for the case of a completed execution of 
a method sendRd(...) (or sendRdX(...) ) by the bus, a simple inspection of the 
sequential code of the methods that have been executed, e.g., sendRd(...) and 
receiveRd(...), suffices to establish the existence of a corresponding transition 
a(G) > a(G’). 

The remaining cases are captured by tables (as mentioned above) which pro- 
vide for each method the following information. The statements in the Location 
column of each table represent for the respective method all possible processes 
generated by a call, i.e., a call to the method itself, and the processes which 
correspond to the await statements appearing in its body. In each row the Next 
release point statement indicates the next await statement or return state- 
ment that can be reached (statically). The dst instruction in each row specifies 
the instruction which corresponds to the Location statement in the simula- 
tion. Finally, Enable condition in each row specifies the enabling conditions 
(expressed in the abstract model) of the rule applications (of the abstract model) 
specified in Rules. In general these rule applications involve the sequential appli- 
cation of one or more rules. For unobservable statements, for which there is no 
corresponding dst instruction, the latter two columns are left unspecified. 

The case analysis then consists of checking statically for each row the local 
structural correspondence between the resulting ABS process (the Next release 
point) and the resulting SOS configuration described by the specified rule appli- 
cations. 


5 Parallelism and Fairness of the ABS Model 


This section discusses how to relax the eager locking policy of the bus imple- 
mentation, without generating inconsistent states. Instead of locking the bus 
unconditionally when executing the read, write, and fetch methods in the ABS 
model, and releasing the lock when no bus synchronization is required, we only 
lock the bus when the triggering conditions of the bus synchronization may be 
invalidated. For example, an optimistic write implementation (see Fig. 11) tries 
to acquire the lock of the bus, and only after the acquisition checks if a race- 
condition has happened and invalidated the shared status of the address n; in 
this case, the write method will backtrack and retry (by calling itself); otherwise 
the write operation can safely be performed. 
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Int write(Ref r) { 
case lookup(cacheMemory, addr(r)) { 
Just(Sh) => { 
await bus!lock_bus(); 
// after waking up do RACE DETECTION 
if (lookup(cacheMemory,addr(r)) == Just(Sh)) { // NO RACE 
await bus!sendRdX(addr(r), this); 
await bus!release__bus(); 
cacheMemory = put(cacheMemory,addr(r),Mo); 


await bus!release_bus(); 
await this!write(r); // RETRY 


| PIJ | 


Fig. 11. Alternative, optimistic implementation of the write method to detect a bus 
race-condition and, in that case, retry the operation. 


} 
else { // RACE CONDITION | 


The strict and relaxed variations of the global synchronization bear strong 
resemblance respectively to conservative [11, 12] and optimistic [13] algorithms in 
parallel and distributed discrete-event simulation (PDES) [14]. As with PDES, 
there is no clear winner between the strict (conservative) and relaxed (optimistic) 
versions of our cache simulator; certain computer programs (input-models) will 
be simulated faster using one version or the other, depending on the inter- 
dependency of the parallel components (for us, the caches). For the contrived 
experiment, we implemented a penalty system in the ABS model. A cache 
penalty is the cost (delay) incurred by failing to read or write to a particular level 
of cache—set here to (Li, L2, L3) =cose (1,10, 100) [15]. We compared the two 
versions for a scenario with full inter-dependency (simultaneous write instruc- 
tions on the same memory block) and a scenario with minimal inter-dependency 
(write instructions on separate memory blocks) between 16 simulated cores. In 
these experiments the strict version was slightly faster up to 2% for the first 
case and losing out by up to 12% in the second case. The experiments were 
executed using the ABS-Erlang backend [16] and Erlang version 21, running 
on quad-socket 8-cores 16-hyperthreads Xeon@®)L7555, which yielded in total 64 
hardware threads. 


Fairness. A concern that often arises in parallel execution is fairness: the degree 
of variability when distributing the computing resources among different parallel 
components—here, the simulated cores. Fairness of parallel execution can affect 
the simulation’s accuracy in approximating the intended (or idealized) many- 
core hardware. To ensure fairness of the simulation, we make use of deployment 
components [17] in ABS. 

A Deployment Component (DC) is an ABS execution location that is created 
with a number of virtual resources (e.g., execution speed, memory use, network 
bandwidth), which are shared among its deployed objects. Any annotated state- 
ment [Cost: x] S decrements by x the resources of its DC and then completes, or 
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Table 1. Total cache penalties between strict /relaxed, with/without DC configurations. 


Strict with DC | Relaxed with DC | Strict | Relaxed 
eer 43068 43290 39183 | 24956 


it will stall its computation if there are currently not enough resources remain- 
ing; the statement S may continue on the next passage of the global symbolic 
time where all the resources of the DCs have been renewed, and will eventually 
complete when its Cost has reached zero. 

We make use of this resource modeling of ABS to assign equal (fair) resources 
of virtual execution speed to the simulated cores of the system. Each Core object 
is deployed onto a separate DC with fixed Speed(1) resources. The processing of 
each instruction has the same cost [Cost: 1]—a generalization, since common pro- 
cessor architectures execute different instructions in different speeds (cycles per 
instruction); e.g., JUMP is faster than LOAD. The result is that all Cores can exe- 
cute maximum one instruction in every time interval of the global symbolic clock, 
and thus no Core can get too far ahead with processing its own instructions—a 
problem that manifests upon the parallel simulation of N number of cores using 
a physical machine of M cores, where N is vastly greater than M. To test this, 
we performed a write-congested experiment with a configuration of 20 simulated 
cores and 3 cache levels, comparing the strict and relaxed variations, with and 
without the use of deployment components. The results (shown in Table 1) were 
measured on a quad-core system running ABS-Erlang, counting the total cache 
penalties of all the cores. With respect to the strict variation, the results with and 
without DC have similar penalties; this can be attributed to the lock-step nature 
of strict bus synchronization, where no cache (and thus core) can unfairly stride 
forward. In the relaxed variation, however, where synchronization is less strict, 
we see that without the fairness imposed by DC, the penalties are almost halved, 
which means some cores are allowed to do multiple (successful) write operations 
while other cores are still waiting on the “backlog” to be simulated. This gives 
rise to less penalties, because of less runtime interleavings of the simulated cores 
and thus less competition between them. 


6 Related Work 


There is in general a significant gap between a formal model and its implemen- 
tation [18]. SOS [1] succinctly formalizes operational models and are well-suited 
for proofs, but direct implementations of SOS quickly lead to very inefficient 
implementations. Executable semantic frameworks such as Redex [19], rewrit- 
ing logic [20,21], and K [22] reduce this gap, and have been used to develop 
executable formal models of complex languages like C [23] and Java [24]. The 
relationship between SOS and rewriting logic semantics has been studied [25] 
without proposing a general solution for label matching. Bijo et al. implemented 
their SOS multicore memory model [26] in the rewriting logic system Maude 
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[3] using an orchestrator for label matching, but do not provide a correctness 
proof wrt. the SOS. Different semantic styles can be modeled and related inside 
one framework; for example, the correctness of distributed implementations of 
KLAIM systems in terms of simulation relations have been studied in rewrit- 
ing logic [27]. Compared to these works on semantics, we implemented an SOS 
model in a distributed active object setting, and proved the correctness of this 
implementation. 

Correctness-preserving compilation is related to correctness proofs for imple- 
mentations, and ensures that the low-level representation of a program preserves 
the properties of the high-level model. Examples of this line of work include type- 
preserving translations into typed assembly languages [28] and formally verified 
compilers [29,30], which proves the semantic preservation of a compiler from C 
to assembler code, but leaves shared-variable concurrency for future work. In 
contrast to this work which studies compilation from one language to another, 
our work focuses on a specific model and its implementation and specifically 
targets parallel systems. 

Simulation tools for cache coherence protocols can evaluate performance and 
efficiency on different architectures (e.g., gems [31] and gem5 [32]). These tools 
perform evaluations of, e.g., the cache hit/miss ratio and response time, by run- 
ning benchmark programs written as low-level read and write instructions to 
memory. Advanced simulators such as Graphite [33] and Sniper [34] run pro- 
grams on distributed clusters to simulate executions on multicore architectures 
with thousands of cores. Unlike our work, these simulators are not based on a 
formal semantics and correctness proofs. Our work complements these simulators 
by supporting the executable exploration of design choices from a programmer 
perspective rather from hardware design. Compared to worst-case response time 
analysis for concurrent programs on multicore architectures [35], our focus is on 
the underlying data movement rather than the response time. 


7 Conclusion 


We have introduced in this paper a methodology for implementing SOS mod- 
els in the active object language ABS, and applied this methodology to the 
implementation of a SOS model of an abstraction of multicore memory systems, 
resulting in a parallel simulator for these systems. A challenge for this implemen- 
tation is to correctly implement the synchronization patterns of the SOS rules, 
which may cross encapsulation barriers in the active objects, and in particular 
label synchronization on parallel transitions steps. We prove the correctness of 
this particular implementation, exploiting that the ABS model allows for a high- 
level coarse-grained semantics. We investigated the further parallelization and 
fairness of the ABS model. 

The results obtained in this paper provide a promising basis for further devel- 
opment of the ABS model for simulating the execution of (object-oriented) pro- 
grams on multicore architectures. A first such development concerns an extension 
of the abstract memory model with data. In particular, having the addresses of 
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the memory locations themselves as data allows to model and simulate different 
data layouts of the dynamically generated object structures. 
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Abstract. Microservices are highly modular and scalable Service Ori- 
ented Architectures. They underpin automated deployment practices like 
Continuous Deployment and Autoscaling. In this paper we formalize 
these practices and show that automated deployment — proven undecid- 
able in the general case — is algorithmically treatable for microservices. 
Our key assumption is that the configuration life-cycle of a microservice 
is split into two phases: (i) creation, which entails establishing initial con- 
nections with already available microservices, and (ii) subsequent bind- 
ing/unbinding with other microservices. To illustrate the applicability 
of our approach, we implement an automatic optimal deployment tool 
and compute deployment plans for a realistic microservice architecture, 
modeled in the Abstract Behavioral Specification (ABS) language. 


1 Introduction 


Inspired by service-oriented computing, Microservices structure software appli- 
cations as highly modular and scalable compositions of fine-grained and loosely- 
coupled services [18]. These features support modern software engineering prac- 
tices, like continuous delivery/deployment [30] and application autoscaling [3]. 
Currently, these practices focus on single microservices and do not take advan- 
tage of the information on the interdependencies within an architecture. On 
the contrary, architecture-level deployment supports the global optimization of 
resource usage and avoids “domino” effects due to unstructured scaling actions 
that may cause cascading slowdowns or outages [27,35,39]. 

In this paper, we formalize the problem of automatic deployment and recon- 
figuration (at the architectural level) of microservice systems, proving formal 
properties and presenting an implemented solution. 

In our work, we follow the approach taken by the Aeolus component 
model [13-15], which was used to formally define the problem of deploying 
component-based software systems and to prove that, in the general case, such 
problem is undecidable [15]. The basic idea of Aeolus is to enrich the specification 
of components with a finite state automaton that describes their deployment life 
cycle. Previous work identified decidable fragments of the Aeolus model: e.g., 
© The Author(s) 2019 
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removing from Aeolus replication constraints (e.g., used to specify a minimal 
amount of services connected to a load balancer) makes the deployment problem 
decidable, but non-primitive recursive [14]; removing also conflicts (e.g., used to 
express the impossibility to deploy in the same system two types of components) 
makes the problem PSpace-complete [34] or even poly-time [15], but under the 
assumption that every required component can be (re)deployed from scratch. 

Our intuition is that the Aeolus model can be adapted to formally reason on 
the deployment of microservices. To achieve our goal, we significantly revisit the 
formalization of the deployment problem, replacing Aeolus components with a 
model of microservices. The main difference between our model of microservices 
and Aeolus components lies in the specification of their deployment life cycle. 
Here, instead of using the full power of finite state automata (like in Aeolus and 
other TOSCA-compliant deployment models [10]), we assume microservices to 
have two states: (i) creation and (ii) binding/unbinding. Concerning creation, 
we use strong dependencies to express which microservices must be immediately 
connected to newly created ones. After creation, we use weak dependencies to 
indicate additional microservices that can be bound/unbound. The principle 
that guided this modification comes from state-of-the-art microservice deploy- 
ment technologies like Docker [36] and Kubernetes [29]. In particular, the weak 
and strong dependencies have been inspired by Docker Compose [16] (a lan- 
guage for defining multi-container Docker applications) where it is possible to 
specify different relationships among microservices using, e.g., the depends_on 
(resp. external_links) modalities that force (resp. do not force) a specific startup 
order similarly to our strong (resp. weak) dependencies. Weak dependencies are 
also useful to model horizontal scaling, e.g., a load balancer that is bound to/un- 
bound from many microservice instances during its life cycle. 

In addition, w.r.t. the Aeolus model, we also consider resource/cost-aware 
deployments, taking inspiration from the memory and CPU resources found 
in Kubernetes. Microservice specifications are enriched with the amount of 
resources they need to run. In a deployment, a system of microservices runs 
within a set of computation nodes. Nodes represent computational units (e.g., 
virtual machines in an Infrastructure-as-a-Service Cloud deployment). Each node 
has a cost and a set of resources available to the microservices it hosts. 

On the model above, we define the optimal deployment problem as follows: 
given an initial microservice system, a set of available nodes, and a new target 
microservice to be deployed, find a sequence of reconfiguration actions that, once 
applied to the initial system, leads to a new deployment that includes the target 
microservice. Such a deployment is expected to be optimal, meaning that the 
total cost (i.e., the sum of the costs) of the nodes used is minimal. We show that 
this problem is decidable by presenting an algorithm working in three phases: 
(1) generate a set of constraints whose solution indicates the microservices to be 
deployed and their distribution over the nodes; (2) generate another set of con- 
straints whose solution indicates the connections to be established; (3) synthesize 
the corresponding deployment plan. The set of constraints includes optimization 
metrics that minimize the overall cost of the computed deployment. 
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Fig. 1. Example of microservice deployment (blue boxes: nodes; green boxes: microser- 
vices; continuous lines: the initial configuration; dashed lines: full configuration). (Color 
figure online) 


The algorithm has NEXPTIME complexity because, in the worst-case, the 
length of the deployment plan could be exponential in the size of the input. 
However, we consider this worst-case unfeasible in practice, as the number 
of microservices deployable on one node is limited by the available resources. 
Under the assumption that each node can host at most a polynomial amount 
of microservices, the deployment problem is NP-complete and the problem of 
deploying a system minimizing its total cost is an NP-optimization problem. 
Moreover, having reduced the deployment problem in terms of constraints, we 
can exploit state-of-the-art constraint solvers [12,23,24] that are frequently used 
in practice to cope with NP-hard problems. 

To concretely evaluate our approach, we consider a real-world microservice 
architecture, inspired by the reference email processing pipeline from Iron.io [22]. 
We model that architecture in the Abstract Behavioral Specification (ABS) lan- 
guage, a high-level object-oriented language that supports deployment model- 
ing [31]. We use our technique to compute two types of deployments: an initial 
one, with one instance for each microservice, and a set of deployments to hor- 
izontally scale the system depending on small, medium or large increments in 
the number of emails to be processed. The experimental results are encouraging 
in that we were able to compute deployment plans that add more than 30 new 
microservice instances, assuming availability of hundreds of machines of three 
different types, and guaranteeing optimality. 


2 The Microservice Optimal Deployment Problem 


We model microservice systems as aggregations of components with ports. 
Each port exposes provided and required interfaces. Interfaces describe offered 
and required functionalities. Microservices are connected by means of bindings 
indicating which port provides the functionality required by another port. As 
discussed in the Introduction, we consider two kinds of requirements: strong 
required interfaces, that need to be already fulfilled when the microservice is 
created, and weak required interfaces, that must be fulfilled at the end of a 


354 M. Bravetti et al. 


deployment (or reconfiguration) plan. Microservices are enriched with the spec- 
ification of the resources they need to properly run; such resources are provided 
to the microservices by nodes. Nodes can be seen as the unit of computation 
executing the tasks associated to each microservice. 

As an example, in Fig. 1 we have reported the representation of the deploy- 
ment of a microservice system inspired by the email processing pipeline that 
we will discuss in Sect.3. Here, we consider a simplified pipeline. A Message 
Receiver microservice handles inbound requests, passing them to a Message Ana- 
lyzer that checks the email content and sends the attachments for inspection to 
an Attachment Analyzer. The Message Receiver has a port with a weak required 
interface that can be fulfilled by Message Analyzer instances. This requirement is 
weak, meaning that the Message Receiver can be initially deployed without any 
connection to instances of Message Analyzer. These connections can be estab- 
lished afterwards and reflect the possibility to horizontally scale the application 
by adding/removing instances of Message Analyzer. This last microservice has 
instead a port with a strong required interface that can be fulfilled by Attachment 
Analyzer instances. This requirement is strong to reflect the need to immediately 
connect a Message Analyzer to its Attachment Analyzer. 

Figure 1 presents a reconfiguration that, starting from the initial deploy- 
ment depicted in continuous lines, adds the elements depicted with dashed lines. 
Namely, a couple of new instances of Message Analyzer and a new instance of 
Attachment Analyzer are deployed. This is done in order to satisfy numerical 
constraints associated to both required and provided interfaces. For required 
interfaces, the numerical constraints indicate lower bounds to the outgoing bind- 
ings, while for provided interfaces they specify upper bounds to the incoming 
connections. Notice that the constraint > 3 associated to the weak required 
interface of Message Receiver is not initially satisfied; this is not problematic 
because constraints on weak interfaces are relevant only at the end of a recon- 
figuration. In the final deployment, such a constraint is satisfied thanks to the 
two new instances of Message Analyzer. These two instances need to be immedi- 
ately connected to an Attachment Analyzer: only one of them can use the initially 
available Attachment Analyzer, because of the constraint < 2 associated to the 
corresponding provided interface. Hence, a new instance of Attachment Analyzer 
is added. 

We also model resources: each microservice has associated resources that it 
consumes (see the CPU and RAM quantities associated to the microservices in 
Fig. 1). Resources are provided by nodes, that we represent as containers for the 
microservice instances, providing them the resources they require. Notice that 
nodes have also costs: the total cost of a deployment is the sum of the costs 
of the used nodes (e.g., in the example the total cost is 598 cents per hour, 
corresponding to the cost of 4 nodes: 2 C4 large and 2 C4 xlarge virtual machine 
instances of the Amazon public Cloud). 

We now move to the formal definitions. We assume the following disjoint sets: 
T for interfaces, Z for microservices, and a finite set R for kinds of resources. 
We use N to denote natural numbers, N* for N \ {0}, and NX for N* U {oo}. 
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Definition 1 (Microservice type). The set I’ of microservice types, ranged 
over by Ti, T2,..., contains 5-ples (P, Ds, Dw, C, R} where: 


- P = (T >» N} ) are the provided interfaces, defined as a partial function from 
interfaces to corresponding numerical constraints (indicating the maximum 
number of connected microservices); 

- D, = (T = N?) are the strong required interfaces, defined as a partial func- 
tion from interfaces to corresponding numerical constraints (indicating the 
minimum number of connected microservices); 

- Dy = (T + N) are the weak required interfaces (defined as the strong ones, 
with the difference that also the constraint 0 can be used indicating that it is 
not strictly necessary to connect microservices); 

- CCT are the conflicting interfaces; 

- R = (R — N) specifies resource consumption, defined as a total function 
from resources to corresponding quantities indicating the amount of required 
resources. 


We assume sets dom(D,), dom(D,,) and C to be pairwise disjoint.* 


Notation: given a microservice type T = (P, Ds, Dw, C, R), we use the following 
postfix projections .prov, .reqs, .reqw, .conf and .res to decompose it; e.g., 7 .reqw 
returns the partial function associating arities to weak required interfaces. In 
our example, for instance, the Message Receiver microservice type is such that 
Message Receiver.reqw(MA) = 3 and Message Receiver.res(RAM) = 4. When the 
numerical constraints are not explicitly indicated, we assume as default value 
oo for provided interfaces (i.e., they can satisfy an unlimited amount of ports 
requiring the same interface) and 1 for required interfaces (i.e., one connection 
with a port providing the same interface is sufficient). 

Inspired by [14], we allow a microservice to specify a conflicting interface 
that, intuitively, forbids the deployment of other microservices providing the 
same interface. Conflicting interfaces can be used to express conflicts among 
microservices, preventing both of them to be present at the same time, or cases 
in which only one microservice instance can be deployed (e.g., a consistent and 
available microservice that can not be replicated). 

Since the requirements associated with strong interfaces must be immediately 
satisfied, it is possible to deploy a configuration with circular dependencies only 
if at least one weak required interface is involved in the cycle. In fact, having a 
cycle with only strong required interfaces would mean to deploy all the microser- 
vices involved in the cycle simultaneously. We now formalize a well-formedness 
condition on microservice types to guarantee the absence of such configurations. 


Definition 2 (Well-formed Universe). Given a finite set of microservice 
types U (that we also call universe), the strong dependency graph of U is 
as follows: GU) = (U,V) with V = {(7,T’)|T,T7’ €e UAAp € Ip € 
dom(T.reqs) N dom(T’.prov)}. The universe U is well-formed if G(U) is acyclic. 


1 Given a partial function f, we use dom(f) to denote the domain of f, i.e., the set 


{e| de’ : (e,e’) € f}. 
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In the following, we always assume universes to be well-formed. Well-formedness 
does not prevent the specification of microservice systems with circular depen- 
dencies, which are captured by cycles with at least one weak required interface. 


Definition 3 (Nodes). The set N of nodes is ranged over by 0),02,... We 
assume the following information to be associated to each node o in N. 


- A function R = (R — N) that specifies node resource availability: we use 
o.res to denote such a function. 
— A value in N that specifies node cost: we use o.cost to denote such a value. 


As example, in Fig. 1, the node Nodel_large is such that Node1_large.res(RAM) = 
4 and Nodel_large.cost = 100. 

We now define configurations that describe systems composed of microservice 
instances and bindings that interconnect them. A configuration, ranged over by 
C1,Co,..., is given by a set of microservice types, a set of deployed microservices 
(with their associated type), and a set of bindings. Formally: 


Definition 4 (Configuration). A configuration C is a 4-ple (Z,T,N,B) 
where: 


- ZC Z is the set of the currently deployed microservices; 

- T = (Z — T) are the microservice types, defined as a function from deployed 

microservices to microservice types; 

N =(Z— N) are the microservice nodes, defined as a function from deployed 

microservices to nodes that host them; 

-~BCTxZ-x Z is the set of bindings, namely 3-ples composed of an inter- 
face, the microservice that requires that interface, and the microservice that 
provides it; we assume that, for (p, 21, 22) € B, the two microservices zı and 
z2 are distinct and p E€ (dom(T'(z1).reqs) Udom(T'(z1).reqw)) N dom(T'(z2).prov). 


In our example, if we use mr to refer to the instance of Message Receiver, and 
ma for the initially available Message Analyzer, we will have the binding (MA, 
mr, ma). Moreover, concerning the microservice placement function N, we have 
N(mr) = Nodel_large and N(ma) = Node2_xlarge. 

We are now ready to formalize the notion of correctness of configuration. 
We first define a provisional correctness, considering only constraints on strong 
required and provided interfaces, and then we define a general notion of config- 
uration correctness, considering also weak required interfaces and conflicts. The 
former is intended for transient configurations traversed during the execution of 
a reconfiguration, while the latter for the final configuration. 


Definition 5 (Provisionally correct configuration). A configuration C = 
(Z,T, N, B) is provisionally correct if, for each node o€ ran(N), it holds? 


YrER. o.res(r) > 5 T(z).res(r) 
z€Z,N(z)=o 
and, for each microservice z € Z, both following conditions hold: 


? Given a (partial) function f, we use ran(f) to denote the range of f, i.e., the function 
image set { f(e) | e € dom(f)}. 
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- (p + n) € T(z).reqs implies that there exist n distinct microservices 
21,+-+-,2n €Z\{z} such that, for every 1 <i < n, we have (p, z, zi) € B; 

- (p =œ n) € T(z).prov implies that there exist no m distinct microservices 
Z1,---,2m E Z\ {2}, with m > n, such that, for every 1 < i < m, we have 
(p, 21,2) € B. 


Definition 6 (Correct configuration). A configuration C = (Z,T, N, B) is 
correct if C is provisionally correct and, for each microservice z € Z, both fol- 
lowing conditions hold: 


- (p |> n) € T(z).reqw implies that there exist n distinct microservices 
Z1,--+,2n €Z\{z} such that, for every 1 <i < n, we have (p, z, zi) € B; 
- pET(z).conf implies that, for each z' € Z\{z}, we have p ¢ dom(T(z’).prov). 


Notice that, in the example in Fig. 1, the initial configuration (in continuous 
lines) is only provisionally correct in that the weak required interface MA (with 
arity 3) of the Message Receiver is not satisfied (because there is only one outgoing 
binding). The full configuration — including also the elements in dotted lines — 
is instead correct: all the constraints associated to the interfaces are satisfied. 

We now formalize how configurations evolve by means of atomic actions. 


Definition 7 (Actions). The set A contains the following actions: 


- bind(p, 21, 22) where z1, z2 €Z, with 21422, and pET: add a binding between 
zı and z2 on port p (which is supposed to be a weak-require port of zı and a 
provide port of z2); 

— unbind(p, 21, 22) where 21, 22€Z, with 2,422, and pET: remove the specified 
binding on p (which is supposed to be a weak required interface of zı and a 
provide port of z2); 

- new(z,T,0, Bs) where z€Z, TET, o€N and B,=(dom(T reqs) => 27-1); 
with Bs (representing bindings from strong required interfaces in T to sets of 
microservices) being such that, for each p € dom(T .reqs), it holds |B,(p)| > 
T.reqs(p): add a new microservice z of type T hosted in o and bind each of 
its strong required interfaces to a set of microservices as described by B,;° 

- del(z) where zE Z: remove the microservice z from the configuration and all 
bindings involving it. 


In our example, assuming that the initially available Attachment Analyzer 
is named aa, we have that the action to create the initial instance of Message 
Analyzer is new(ma, MessageAnalyzer, Node2_xlarge, (AA +> {aa})). Notice that it 
is necessary to establish the binding with the Attachment Analyzer because of 
the corresponding strong required interface. 

The execution of actions can now be formalized using a labeled transition 
system on configurations, which uses actions as labels. 


3 Given sets S and 9’ we use: 2° to denote the power set of S, i.e., the set {S’ | S’ C S}; 
S — S' to denote set difference; and |S| to denote the cardinality of S. 
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Definition 8 (Reconfigurations). Reconfigurations are denoted by transitions 
CC’ meaning that the execution of a € A on the configuration C produces a 


new configuration C’. The transitions from a configuration C = (Z,T, N, B) are 
defined as follows: 


bind(p,z1,22) unbind(p,21,22) 
LT, T 


c (Z, T, N, BU (p, 21, 22)) c (Z, T, N, B\ (p, 21, 22)) 
if (p, 21, 22) Z B and if (p, 21, 22) E€ B and 


p € dom(T(z1).reqw) N dom(T(z2).prov) p € dom(T(z1).reqw) N dom(T (z2).prov) 


c RUT OB), (ZU {2}, T’, N’, B') CAG) T', N’, B') 
ifzg Z and if T' ={(2 6 T)ET|z42'} and 
Vp € dom(T.reqs). Vz’ € Bs(p). N = {(z/ Boje N|z42'} and 
p € dom(T(z’).prov) and B' = {(p, 21,22) E€ B | z Z {a1, z2}} 


T'=TU{(z => T)} and 
N' = NU{(z = 0)} and 
B' = BU { (p, z, 2") | 2’ € Bs(p)} 


A deployment plan is simply a sequence of actions that transform a pro- 
visionally correct configuration (without violating provisional correctness along 
the way) and, finally, reach a correct configuration. 


Definition 9 (Deployment plan). A deployment plan P from a provisionally 
correct configuration Co is a sequence of actions Q1,...,Qm such that: 


~ there exist C,,...,Cm provisionally correct configurations, with Cj. > Ci 
fori<i< m, and 
— Cm is a correct configuration. 


Deployment plans are also denoted with Co “+ Cy 2> ++» 25 Cm. 


In our example, a deployment plan that reconfigures the initial provisionally 
correct configuration into the final correct one is as follows: a new action to 
create the new instance of Attachment Analyzer, followed by two new actions 
for the new Message Analyzers (as commented above, the connection with the 
Attachment Analyzer is part of these new actions), and finally two bind actions 
to connect the Message Receiver to the two new instances of Message Analyzer. 

We now have all the ingredients to define the optimal deployment problem, 
that is our main concern: given a universe of microservice types, a set of available 
nodes and an initial configuration, we want to know whether and how it is 
possible to deploy at least one microservice of a given microservice type 7 by 
optimizing the overall cost of nodes hosting the deployed microservices. 


Definition 10 (Optimal deployment problem). The optimal deployment 
problem has, as input, a finite well-formed universe U of microservice types, a 
finite set of available nodes O, an initial provisionally correct configuration Co 
and a microservice type Te E€ U. The output is: 
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- A deployment plan P = Cy “> C, © --- 2, Cm such that 
o for all Ci = (Zi, Ti, Ni, Bi), with 1 < a < m, it holds Vz € Zi. T;(z) € 
U ^ Ni(z) € O, and 
© Cm = (Zm, Tm, Nm, Bm) satisfies dz € Zm : T;(z) = Ti; 
if there exists one. In particular, among all deployment plans satisfying 
the constraints above, one that minimizes >) ,€0.(4z.Ny,(z)=0) 0-008t (i-€., the 


overall cost of nodes in the last configuration Cm), is outputted. 
— no (stating that no such plan exists); otherwise. 


We are finally ready to state our main result on the decidability of the opti- 
mal deployment problem. To prove the result we describe an approach that splits 
the problem in three incremental phases: (1) the first phase checks if there is a 
possible solution and assigns microservices to deployment nodes, (2) the inter- 
mediate phase computes how the microservices need to be connected to each 
other, and (3) the final phase synthesizes the corresponding deployment plan. 


Theorem 1. The optimal deployment problem is decidable. 


Proof. The proof is in the form of an algorithm that solves the optimal deploy- 
ment problem. We assume that the input to the problem to be solved is given 
by U (the microservice types), O (the set of available nodes), Co (the initial 
provisionally correct configuration), and J; € U (the target microservice type). 
We use Z(U) to denote the set of interfaces used in the considered microservice 
types, namely Z(U) = Urey dom(T .reqs) Udom(T .reqy) U dom(T .prov) U T.conf. 
The algorithm is based on three phases. 

Phase 1 The first phase consists of the generation of a set of constraints that, 
once solved, indicates how many instances should be created for each microser- 
vice type T (denoted with inst(7)), how many of them should be deployed on 
node o (denoted with inst(T, 0)), and how many bindings should be established 
for each interface p from instances of type T — considering both weak and strong 
required interfaces — and instances of type T’ (denoted with bind(p,7,7’)). 
We also generate an optimization function that guarantees that the generated 
configuration is minimal w.r.t. its total cost. 

We now incrementally report the generated constraints. The first group of 
constraints deals with the number of bindings: 


VAN /\ T.reqs(p)-inst(T) < X` bind(p,T,T’) (1a) 
peL(U) TEU, pedom(T.reqs) T'EU 

VAN VAN T .reqw(p)-inst(T) < bind(p,T,T’) (1b) 
peL(U) TEU, pedom(T.reqw) T!cu 

VAN VAN T .prov(p)-inst(T) > 5 bind(p, T’, T) (1c) 
pEeL(U) TEU, T.prov(p)<co T'cU 

VAN VAN inst(T)=0 = © dind(p,T',T) =0 (1d) 
pEI(U) TEU, T.prov(p)=co T'EU 


TAN VAN X bind(p, T’, T) =0 (1e) 


pEIT(U) TEU, pgdom(T.prov) T’cU 
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Constraint 1a and 1b guarantee that there are enough bindings to satisfy all the 
required interfaces, considering both strong and weak requirements. Symmetri- 
cally, constraint 1c guarantees that the number of bindings is not greater than 
the total available capacity, computed as the sum of the single capacities of each 
provided interface. In case the capacity is unbounded (i.e., oo), it is sufficient 
to have at least one instance that activates such port to support any possible 
requirement (see constraint 1d). Finally, constraint le guarantees that no bind- 
ing is established connected to provided interfaces of microservice types that are 
not deployed. 

The second group of constraints deals with the number of instances of 
microservices to be deployed. 


inst(T;) >1 (2a) 
VAN /\ VAN inst(7)>0 = inst(T’)=0 (2b) 
pET(U) TEU, T'cU-{T}, 


peT.conf pedom(T’ prov) 


VAN A inst(T) <1 (2c) 


pEeL(U) TEU, peT.conf ^ 
pedom(T.prov) 


VAN VAN N  bind(p,T,T') < inst(T) - inst(T’) (2d) 
pEeZ(U) TEU T'EeU-{T} 
VAN /\ bind(p, T, T) < inst(T) - (inst(T) — 1) (2e) 


pEI(U) TEU 


The first constraint 2a guarantees the presence of at least one instance of 
the target microservice. Constraint 2b guarantees that no two instances of dif- 
ferent types will be created if one activates a conflict on an interface provided 
by the other one. Constraint 2c, consider the other case in which a type acti- 
vates the same interface both in conflicting and provided modality: in this case, 
at most one instance of such type can be created. Finally, the constraints 2d 
and 2e guarantee that there are enough pairs of distinct instances to establish 
all the necessary bindings. Two distinct constraints are used: the first one deals 
with bindings between microservices of two different types, the second one with 
bindings between microservices of the same type. 

The last group of constraints deals with the distribution of microservice 
instances over the available nodes O. 


inst(T) = 5 inst(T, 0) (3a) 
o€O 

VAN VAN D inst(T, 0) -T.res(r) < o.res(r) (3b) 

r€RoE€O TEU 

VAN ( = inst(T,0) > 0) = used(o) (3c) 

oO TEU 

min 5 o.cost (3d) 


o€O, used(o) 
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Constraint 3a simply formalizes the relationship among the variables inst(T ) 
and inst(T, 0) (the total amount of all instances of a microservice type, should 
correspond to the sum of the instances locally deployed on each node). Con- 
straint 3b checks that each node has enough resources to satisfy the requirements 
of all the hosted microservices. The last two constraints define the optimization 
function used to minimize the total cost: constraint 3c introduces the boolean 
variable used(o) which is true if and only if node o contains at least one microser- 
vice instance; constraint 3d is the function to be minimized, i.e., the sum of the 
costs of the used nodes. 

These constraints, and the optimization function, are expected to be given 
in input to a constraint/optimization solver. If a solution is not found it is not 
possible to deploy the required microservice system; otherwise, the next phases 
of the algorithm are executed to synthesize the optimal deployment plan. 

Phase 2 The second phase consists of the generation of another set of con- 
straints that, once solved, indicates the bindings to be established between any 
pair of microservices to be deployed. More precisely, for each type T such that 
inst(T) > 0, we use s7, with 1 <i < inst(T), to identify the microservices of 
type T to be deployed. We also assume a function N that associates microser- 
vices to available nodes O, which is compliant with the values inst(J, 0) already 
computed in Phase 1, i.e., given a type J and a node o, the number of a, with 
1 <i<inst(T), such that N(s?7) = o coincides with inst(T, 0). 

In the constraints below we use the variables b(p, s67 ) (with i Æ J, if 
T = T’): its value is 1 if there is a connection between the required inter- 
face p of s? and the provided interface p of sT 0 otherwise. We use n and 
m to denote inst(T) and inst(7’), respectively, and an auxiliary total func- 
tion limProv(T', p) that extends T’.prov associating 0 to interfaces outside its 


domain. 


A VAN VAN 5 b(p, s7, s7) < limProv(T", p) (4a) 


TEU pET(U) i€1...n je(1...m)\{ilT=T'} 


VAN VAN VAN 1 b(p, 57,87 ) > T.reqs(p) (4b) 


TEU pedom(T.reqs) #61... jE(1...m)\{i]T=T’} 


VAN \ VAN 5 b(p, 87,8) ) > T.requ(p) (4c) 


TEU pedom(T.reqw) #€1..-n je(1...m)\{i|T=T"} 


A A A » b(p,s;,8, )=0 (4d) 


TEU py¢dom(T.reqs)Udom(T.reqw) #€1-.-n jE(1...m)\{i]T=T} 


Constraint 4a considers the provided interface capacities to fix upper bounds 
to the bindings to be established, while constraints 4b and 4c fix lower bounds 
based on the required interface capacities, considering both the weak (see 4b) and 
the strong (see 4c) ones. Finally, constraint 4d indicates that it is not possible 
to establish connections on interfaces that are not required. 

A solution for these constraints exists because, as also shown in [13], the 
constraints la... 2e (already solved during Phase 1) guarantee that the config- 
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uration to be synthesized contains enough capacity on the provided interfaces 
to satisfy all the required interfaces. 

Phase 8 In this last phase we synthesize the deployment plan that, when 
applied to the initial configuration Co, reaches a new configuration C; with nodes, 
microservices and bindings as computed in the first two phases of the algorithm. 
Without loss of generality, in this decidability proof we show the existence of 
a simple plan that first removes the elements in the initial configuration and 
then deploys the target configuration from scratch. However, as also discussed 
in Sect. 3, in practice it is possible to define more complex planning mechanisms 
that re-use microservices already deployed. 

Reaching an empty configuration is a trivial task since it is always possible 
to perform in the initial configuration unbind actions for all the bindings con- 
nected to weak required interfaces. Then, the microservices can be safely deleted. 
Thanks to the well-formedness assumption (Definition 2) and using a topological 
sort, it is possible to order the microservices to be removed without violating 
any strong required interface (e.g., first remove the microservice not requiring 
anything and repeat until all the microservices have been deleted). 

The deployment of the target configuration follows a similar pattern. Given 
the distribution of microservices over nodes (computed in the first phase) and the 
corresponding bindings (computed in the second phase), the microservices can be 
created by following a topological sort considering the microservices dependen- 
cies following from the strong required interfaces. When all the microservices are 
deployed on the corresponding nodes, the remaining bindings (on weak required 
ports) may be added in any possible order. 


Remark 1. The constraints generated during Phase 2 of the algorithm, in order 
to establish the microservice bindings, are expected to be given in input to a 
constraint /optimization solver. One can enrich such constraints with metrics 
to optimize, e.g., the number of local bindings (i.e., give a preference to the 
connections among microservices hosted in the same node): 


. T T 
TAN > ) b(p, s; » 35 ) 
T,T'€Ui€1...inst(T),j€1...inst(T’),peZ(U),N(s7 )#N(s?’) 


Another example, used in the case study discussed in Sect.3, is the following 
metric that maximizes the number of bindings’: 


T ut! 
max 5 b(p, s; » 85 ) 
s7 sT’ peT(U) 


trJ 
From the complexity point of view, it is possible to show that the decision 
versions of the optimization problem solved in Phase 1 is NP-complete, in Phase 


4 We model a load balancer as a microservice having a weak required interface, with 
arity 0, that can be provided by its back-end service. By adopting the above maxi- 
mization metric, the synthesized configuration connects all possible services to such 
required interface, thus allowing the load balancer to forward requests to all of them. 
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Fig. 2. Microservice architecture for email processing. 


2 is in NP, while the planning in Phase 3 is synthesized in polynomial time. 
Unfortunately, due to the fact that numeric constraints can be represented in 
log space, the output of Phase 2 requiring the enumeration of all the microser- 
vices to deploy can be exponential in the size of the output of Phase 1 (indi- 
cating only the total number of instances for each type). For this reason, the 
optimal deployment problem is in NEXPTIME. However, we consider unfeasi- 
ble in practice the deployment of an exponential number of microservices on one 
node having limited resources. If at most a polynomial number of microservices 
can be deployed on each node, we have that the optimal deployment problem 
becomes an NP-optimization problem and its decision version is NP-complete. 
See the companion technical report [8] for the formal proofs of complexity. 


3 Application of the Technique to the Case-Study 


Given the asymptotic complexity of our solution (NP under the assumption 
of polynomial size of the target configuration) we have decided to evaluate its 
applicability in practice by considering a real-world microservice architecture, 
namely the email processing pipeline described in [22]. The considered archi- 
tecture separates and routes the components found in an email (headers, links, 
text, attachments) into distinct, parallel sub-pipelines with specific tasks (e.g., 
remove malicious attachments, tag the content of the mail). We report in Fig. 2 
a depiction of the architecture. When an email reaches the Message Receiver it 
is forwarded to the Message Parser, which sends each component into a specific 
sub-pipeline. In the sub-pipelines, some microservices — e.g., Text Analyzer and 
Attachment Analyzer — coordinate with other microservices — e.g., Sentiment 
Analyzer and Virus Scanner — to process their inputs. Each microservice in the 
architecture has a given resource consumption (expressed in terms of CPU and 
memory). As expected, the processing of each email component entails a specific 
load. Some microservices can handle large inputs, e.g., in the range of 40K simul- 
taneous requests (e.g., Header Analyzer that processes short and uniform inputs). 
Other microservices sustain heavier computations (e.g., Image Recognizer) and 
can handle smaller simultaneous inputs, e.g., in the range of 10K requests. 
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To model the system above, we use the Abstract Behavioral Specification 
(ABS) language, a high-level object-oriented language that supports deploy- 
ment modeling [31]. ABS is agnostic w.r.t. deployment platforms (Amazon AWS, 
Microsoft Azure) and technologies (e.g., Docker or Kubernetes) and it offers 
high-level deployment primitives for the creation of new deployment components 
and the instantiation of objects inside them. Here, we use ABS deployment 
components as computation nodes, ABS objects as microservice instances, and 
ABS object references as bindings. Finally, to describe the requirements in our 
model, we use ABS with SmartDepl [25], an extension that supports deployment 
annotations. Strong required interfaces are modeled as class annotations indi- 
cating mandatory parameters for the class constructor: such parameters contain 
the references to the objects corresponding to the microservices providing the 
strongly required interfaces. Weak required interfaces are expressed as anno- 
tations concerning specific methods used to pass, to an already instantiated 
object, the references to the objects providing the weakly required interfaces. We 
define a class for each microservice type, plus one load balancer class for each 
microservice type. A load balancer distributes requests over a set of instances 
that can scale horizontally. Finally, we model nodes corresponding to Amazon 
EC2 instances: c4_large, c4_xlarge, and c4_2xlarge (with the corresponding 
provided resources and costs). 


Microservice (max computational load) | Initial (10K) | +20K +50K +80K 
MessageReceiver(oo) 
MessageParser(40K) 
HeaderAnalyzer(40K) 

LinkAnalyzer(40K) 
TextAnalyzer(15K) 

SentimentAnalyzer(15K) 

AttachmentsManager(30K) 
VirusScanner(13K) 
ImageAnalyzer(30K) 
NSFWDetector(13K) 
ImageRecognizer(13k) 
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MessageAnalyzer( 70K) 


In the table above, we report the result of our algorithm w.r.t. four incre- 
mental deployments: the initial in column 2 and under incremental loads in 
3-5. We also consider an availability of 40 nodes for each of the three node 
types. In the first column of the Table, next to a microservice type, we report 
its corresponding maximum computational load, i.e., the maximal number of 
simultaneous requests that it can manage. As visible in columns 2-5, differ- 
ent maximal computational loads imply different scaling factors w.r.t. a given 
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number of simultaneous requests. In the initial configuration we consider 10K 
simultaneous requests and we have one instance of each microservice type (and 
of the corresponding load balancer). The other deployment configurations deal 
with three scenarios of horizontal scaling, assuming three increasing increments 
of inbound messages (20K, 50K, and 80K). In the three scaling scenarios, we 
do not implement the planning algorithm described in Phase 3 of the proof of 
Theorem 1. Contrarily, we take advantage of the presence of the load balancers 
and, as described in Remark 1, we achieve a similar result with an optimiza- 
tion function that maximizes the number of bindings of the load balancers. For 
every scenario, we use SmartDepl [33] to generate the ABS code for the plan that 
deploys an optimal configuration, setting a timeout of 30min for the computa- 
tion of every deployment scenario.” The ABS code modeling the system and the 
generated code are publicly available at [7]. A graphical representation of the 
initial configuration is available in the companion technical report [8]. 


4 Related Work and Conclusion 


In this work, we consider a fundamental building block of modern Cloud sys- 
tems, microservices, and prove that the generation of a deployment plan for an 
architecture of microservices is decidable and fully automatable; spanning from 
the synthesis of the optimal configuration to the generation of the deployment 
actions. To illustrate our technique, we model a real-world microservice archi- 
tecture in the ABS [31] language and we compute a set of deployment plans. 

The context of our work regards automating Cloud application deployment, 
for which there exist many specification languages [5,11], reconfiguration proto- 
cols [6,19], and system management tools [26,32,37,38]. Those tools support the 
specification of deployment plans but they do not support the automatic distri- 
bution of software instances over the available machines. The proposals closest to 
ours are those by Feinerer [20] and by Fischer et al. [21]. Both proposals rely on 
a solver to plan deployments. The first is based on the UML component model, 
which includes conflicts and dependencies, but lacks the modeling of nodes. The 
second does not support conflicts in the specification language. Neither proposals 
support the computation of optimal deployments. 

Three projects inspire our proposal: Aeolus [13,14], Zephyrus [1], and Conf- 
Solve [28]. The Aeolus model paved the way to reason on deployment and recon- 
figuration, proving some decidability results. Zephyrus is a configuration tool 
based on Aeolus and it constitutes the first phase of our approach. ConfSolve is 
a tool for the optimal allocation of virtual machines to servers and of applications 
to virtual machines. Both tools do not synthesize deployment plans. 


5 Here, 30min are a reasonable timeout since we predict different system loads and 
we compute in advance a different deployment plan for each of them. An interesting 
future work would aim at shortening the computation to a few minutes (e.g., around 
the average start-up time of a virtual machine in a public Cloud) to obtain on-the-fly 
deployment plans tailored to unpredictable system loads. 
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Regarding autoscaling, existing solutions [2,4,17,29] support the automatic 
increase or decrease of the number of instances of a service/container, when some 
conditions (e.g., CPU average load greater than 80%) are met. Our work is an 
example of how we can go beyond single-component horizontal scaling policies 
(as analyzed, e.g., in [9]). 

As future work, we want to investigate local search approaches to speed-up 
the solution of the optimization problems behind the computation of a deploy- 
ment plan. Shorter computation times would open our approach to contexts 
where it is unfeasible to compute plans ahead of time, e.g., due to unpredictable 
loads. 
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Abstract. Data flow formalisms are commonly used to model systems 
in order to solve problems of buffer sizing and task scheduling. A pre- 
requisite for static analysis of a modeled system is the existence of a 
periodic schedule in which the sizes of communication channels can be 
bounded for an unbounded execution (consistency), and that communi- 
cation dependencies do not introduce a deadlock in such an execution 
(liveness). In the context of Cyber-Physical Systems, components are 
often interfaced with the physical world and have frequency constraints. 
The existing data flow formalisms lack expressiveness to fully cover the 
expected behavior of these components. We propose an extension to Syn- 
chronous Data Flow (SDF) formalism, called Polygraph, that includes 
frequency constraints and adjustable communication rates. We show that 
with these extensions, the conditions for a model to be consistent and live 
are no longer sufficient, and we extend the corresponding theorems with 
necessary and sufficient conditions to preserve these properties. We also 
introduce a framework to check the liveness of a Polygraph model, imple- 
mented in the tool DIVERSITY, along with preliminary experiments to 
validate this approach. 


1 Introduction 


Context. Cyber-Physical Systems (CPS) are increasingly present in everyday 
life. In these systems, the components require a certain amount of input data 
to produce a known amount of output data, and some of them must do so 
in synchrony with a reference time scale. For example, the next generation of 
autonomous vehicles will heavily rely on sensor fusion systems to operate the 
car. Sensors and actuators have specified frequencies. To produce its output, the 
fusion kernel requires a certain number of samples from several sources, with a 
temporal correlation between them. 

Often, when implementing this kind of system, the prediction of its perfor- 
mance is important to the system designer. The performance prediction covers 
different characteristics of the system, including its throughput, memory foot- 
print, and latency. In distributed implementations of such systems, an analysis of 
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the communications between the components is necessary to configure a network 
capable to respect the application’s real-time requirements. 

Data flow formalisms [3,14] can be used to perform this kind of performance 
analysis [4,5,10-12]. A prerequisite to analyze a model is the existence of a 
periodic schedule with two properties. The first property, consistency, requires 
that the sizes of the communication buffers remain bounded for an unbounded 
execution of the periodic schedule. In practice, if a model is not consistent, it 
is not possible to implement the communications without losing data samples. 
The second property, liveness, requires the absence of deadlocks in the schedule. 


Motivation and Goals. The limitation of the existing data flow formalisms to 
model the considered systems is the lack of expressiveness regarding the syn- 
chronization on a common time scale for different components. Overcoming this 
limitation is the subject of recent research work [6]. Our goal is to extend an 
existing data flow formalism for which the consistency and liveness properties of 
a given model are decidable. In doing so, we want to ensure that the expressive- 
ness extension does not impact the decidability of these properties. With this 
extension, all applicative constraints are taken into account when checking the 
prerequisites for a performance analysis. The verification can be performed in 
abstraction of a particular implementation’s characteristics (like execution times 
or mapping), and the results are the same for different implementations. More- 
over, the performance analysis can benefit from the additional information on 
the system provided by the extension. 


Approach and Main Results. This paper introduces Polygraph, an extension to 
Synchronous Data Flow (SDF) [14] for specification of frequency constraints on 
the components. We use an arithmetic based on rational numbers to reason on 
data exchanges between components. We show that the theorems that provide 
a theoretical foundation for practical verification of consistency and liveness for 
an SDF model can be generalized to this new formalism. Finally, we propose 
a symbolic execution framework to decide the liveness of models expressed in 
Polygraph, in a way similar to [11,14]. 
The contributions of this work include: 


— a data flow formalism, called Polygraph, extending the well-known SDF [14] 
formalism, to support the synchronization of data production and consump- 
tion on a reference time scale; 

— a demonstration that the decidability of two classical properties of dataflow 
models, namely consistency and liveness, is preserved for this new formalism; 

— an adaptation to the new formalism of an existing symbolic execution tech- 
nique for evaluation of liveness in the DIVERSITY tool and initial experi- 
ments to validate this approach. 


Outline. The remainder of this paper is organized as follows. Section 2 gives an 
informal introduction to the proposed modeling approach, with a step-by-step 
explanation relying on an illustrative system. In Sect. 3, we formalize Polygraph 
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Camera 


Display 
40Hz 


Fig. 1. Motivating example: a data fusion system modeled as a data flow graph. The 
upper indexes “a” to “d” denote an amount of data exchanged by the components in 
different variants of the model. The rates denoted by upper index “d” are those of 


Polygraph, and initial conditions for this configuration are denoted by (i) and (ii). 


and provide extended statements and a sketch of proof for the consistency and 
liveness theorems. Section 4 presents a framework to check the liveness property 
for Polygraph and a preliminary evaluation. In Sect.5, we discuss related work, 
while Sect. 6 presents conclusion and perspectives. 


2 Motivation and Running Example 


Running Example. To introduce the modeling approach behind Polygraph, we 
use a toy example of a data fusion system that could be integrated into the 
cockpit display of a car, depicted in Fig.1. The system is composed of three 
sensors producing data samples to be used by a data fusion component, and a 
display component. The function of the sensor components is to read the data 
from their sensors, while the function of the data fusion component is to compute 
a result based on this data. The function of the display component is to render 
the fusion result on a screen. To do so, the sensor components send the data to 
the fusion component, and the fusion component sends the result to the display 
component. The first sensor component is a video camera producing frames. The 
other two sensor components analyze radar and lidar based samples to produce 
a descriptor of the closest detected obstacles. The fusion component uses this 
information to draw the obstacle descriptors on the corresponding frame. 

The first step to model this system is to build a graph capturing data depen- 
dencies between the components. Each vertex of this graph models an actor, an 
abstract entity representing the function of a component. Each directed edge of 
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the graph models a communication channel, the source actor being the producer 
of data consumed by the destination actor. The structure of the graph in Fig. 1 
illustrates the dependencies in our example. The communication policy on the 
channels is First-In First-Out (FIFO), the write operation is non-blocking, and 
the read operation is blocking. On each channel, the atomic amount of data 
exchanged by the connected actors is called a token, and all write and read oper- 
ations are measured in tokens. An actor produces (resp. consumes) a certain 
number of tokens on a channel when it writes (resp. reads) the corresponding 
amount of data. With this policy, the graph can be assimilated to a Kahn Pro- 
cess Network (KPN) [13]. In a KPN, the communications are determinate, but 
in general it is not possible to decide if the sizes of the channels can be bounded 
for an unbounded execution of the system. 


Synchronous and Asynchronous Constraints. In practice, sensors and actuators 
have a fixed sampling rate, and the production of each data sample occurs at 
that specified frequency. To model these constraints, we propose to label some 
actors with frequencies, corresponding to the real-life constraint. An actor with a 
frequency label must fire at that frequency. We further detail this notion of firing 
below, but for now it is sufficient to say that the firing of an actor is an atomic 
process, during which it performs the actions and communications expected from 
the modeled component. A global clock provides ticks to synchronize the firing 
of frequency labeled actors. For our example, we consider the frequency labeling 
illustrated by Fig. 1. 

Generally, in real-life systems, computation kernels compute when input data 
is available and do not have frequency constraints. In our frequency labeling, the 
actors modeling such components can be left without a frequency label. In our 
example, this is the case for the fusion actor. 

The possibility to have unlabeled actors is an important part of our app- 
roach, as further discussed in Sect. 5. It allows to mix a synchronous firing policy 
for labeled actors, and an asynchronous firing policy for unlabeled actors. This 
means that the scheduling of firings has periodic constraints only where needed, 
which offers more options for optimization algorithms. 


Static Rates. Another characteristic of real-life software components in our con- 
text is that they require a fixed number of input samples from each different 
source. Also, there must be a correlation between the production time of the 
samples consumed from different sources. In our example, the fusion component 
requires one token from each sensor, and these samples must have a close-enough 
production time. This constraint can be captured by KPN restrictions, such 
as Synchronous Data Flow (SDF) [14]. In SDF, both ends of each channel are 
assigned a communication rate, denoting the fixed number of tokens produced or 
consumed by the connected actors’ firings. This characteristic allows to decide 
whether the sizes of the channels are bounded for an unbounded execution. 
Graphs respecting this property are said to be consistent. 

Without taking frequencies into account, the communication rates denoted 
by an upper index “a” in Fig. 1 match the description of the system. Indeed, the 
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sensor actors produce one token each, the fusion actor consumes these tokens, 
and in turn produces one token to be consumed by the display actor. With these 
rates, considering a marking of the graph with any number of tokens stored in 
the channels, if firing all the actors once, the same number of tokens remains in 
the channels. Hence, the SDF graph is consistent. But when taking frequencies 
into account, the graph is no longer consistent. In this example, the camera 
produces 30 tokens per second, the radar produces 120 tokens per second, and 
the lidar produces 10 tokens per second. This means that per second, because 
of the production rate and frequency of the lidar, the fusion actor will be able 
to fire only 10 times. It will consume only 10 tokens from the camera and radar 
actors, leaving 20 and 110 unconsumed tokens per second on their respective 
channels. Hence, it is no longer possible to bound the size of these channels for 
an unbounded execution of the graph. This shows that to achieve consistency, for 
any frequency labeled actor, the number of asynchronous firings of its unlabeled 
predecessors and successors should be limited. 

A possible adaptation of communication rates, denoted by upper index “b” 
in Fig. 1, takes frequency inheritance into account and restores the consistency 
property. With the production and consumption rates both set to 1 on the 
channel connecting the camera and the fusion actors, the fusion actor basically 
inherits a frequency constraint of 30 Hz. It inherits the same frequency constraint 
from the radar and lidar actors since it now consumes 4 x 30 = 1 x 120 tokens 
per second from the radar, and 1 x 30 = 3 x 10 tokens per second from the 
lidar. The rates on the channel connecting the fusion and display actors are also 
balanced. But with these rates, the number of tokens does not reflect accurately 
the expected behavior of the modeled components. For example, the fusion actor 
would consume 4 tokens per activation from the radar actor, while in reality the 
component only requires 1. 


Cyclo-Static Rates. It is possible to use Cyclo-Static Data Flow (CSDF) [3] 
to get closer to the real communication requirements. In CSDF, the rates of 
the actors are fixed as in SDF, but the successive firings of an actor cyclically 
consume and produce a different number of tokens on every connected channel. 
The successive rates on each channel are expressed as a sequence of natural 
numbers. For example, an actor with a cyclo-static sequence of output rates 
[1,2] produces 1 token for its first firing, 2 tokens for the second, 1 for the third 
and so on. A zero rate may occur in the sequence, meaning that the actor does 
not push or pull tokens on the channel for the corresponding firing. 

A cyclo-static sequence is necessary on a channel if the connected actors have 
frequency constraints conflicting with the expected communication behavior. 
In this case, we propose that one of the actors must be chosen as having the 
reference frequency for the communication, and the other actor must adapt its 
communication rate to a cyclo-static sequence accordingly. Back to our example 
(see variant “c” in Fig. 1), the fusion actor requires one token from each sensor 
every firing. Since the component is synchronized on camera frames, we decide 
that the actor’s reference frequency should be 30 Hz. In this case, the frequency 
constraints do not conflict with the expected communication behavior, and we 
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Fig. 2. Firings of actors of the motivating example: the firings are identified by the 
initial letter of the corresponding actor and the rank of the firing, arrows show data 
dependencies between firings, and a reference time scale constrains the firing of timed 
actors. The data dependencies marked by a cross in (a) introduce a causality issue. 


assign production and consumption rates of 1 on the channel connecting the 
fusion and camera actors. Now, considering the radar actor, the fusion actor 
only requires 30 tokens per second out of 120. Considering this ratio, we assign 
the sequence [0,0,0,1] as production rates for the radar actor, and the rate 1 
for the fusion actor. The same logic applies for the lidar actor, the fusion actor 
requires 30 tokens per second, but only 10 tokens per second are produced. We 
then assign the cyclo-static sequence |1, 0,0] as consumption rates for the fusion 
actor, and the rate 1 for the lidar actor. A similar logic is applied for the display 
actor. The consequence on the stream of actual data values highly depends on 
the implemented function, and is therefore out of the scope of the data flow 
modeling. In the particular case of the radar actor in our example, the software 
implementation could perform a downsampling of the sensed data, or just send 
the latest sample. 

The corresponding communication rates, denoted by upper index “c” in 
Fig. 1, give a graph where only the required tokens are exchanged on the chan- 
nels, and the consistency property is preserved. But in all generality, choosing 
the appropriate cyclic rate sequences for all the channels in a graph is time 
consuming and error prone. 


Rational Rates. We propose instead to extend the SDF model with rational com- 
munication rates. A rational communication rate r = p/q specifies that the actor 
produces or consumes p tokens every q firings, and the natural number of tokens 
produced or consumed by any firing is r rounded either up or down, denoted [r] 
and |r| respectively. With the semantic formalized in the next section, there is 
a unique default cyclo-static sequence that corresponds to a given rational rate. 
The default sequences for the rates denoted by an upper index “d” in Fig. 1 are 
those denoted by upper index “c”. As explained earlier when assigning cyclo- 
static sequences, in this extension, only one rate on a given channel can be a 
rational number with denominator greater than one. The methodology remains 
the same, for any channel, one actor’s frequency is considered as a reference, and 
the other one adapts its rates according to that reference. 
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Initial Conditions. With the frequency labeling and rational communication 
rates, we obtain a model that describes as closely as possible the communication 
and timing requirements of our illustrative example. But there are causality 
issues in this model. Figure 2(a) illustrates the timing of actor firings in our 
example, and the data dependencies between them, according to the semantic 
defined in the next section. It is obvious that the data dependencies marked by 
a cross are not satisfied in time. 

This kind of causality issue can also appear in SDF: in the case of cyclic 
graphs, the firings of the actors in a cycle all depend on each other. To prevent 
this, it is possible to mark the channels with an initial number of tokens, allowing 
sufficient initial firings to complete the firing of all actors in the cycle. The 
liveness property of an SDF graph is verified when all the cycles in the graph are 
marked with enough tokens to prevent a deadlock [14]. With the SDF extensions 
we propose, this condition is no longer sufficient. We need to be able to shift the 
production or consumption of tokens in order to make sure that when a firing 
requires input tokens, they are produced at an earlier tick of the global clock. 

One way to achieve this is to rotate the default sequences defined by the 
rational rates. For this, we propose a rational initial marking of the graph. Each 
channel with natural rates at both ends can be marked with an initial number 
of tokens as in SDF. Each other channel with rational rate r = p/q on either 
end can be initially marked with a rational number n + k/q with k < q, which 
denotes that the channel initially holds n tokens (as in SDF), and the default 
sequence is rotated by k. If the rational rate is on the producer, the default 
sequence is rotated left, otherwise it is rotated right. In Fig. 1, considering the 
default sequences denoted by “c”, the corresponding rational rates denoted by 
upper index “d”, and the initial marking (ii), the marking of 3/4 on the channel 
connecting the radar and fusion actors rotates the default sequence [0, 0,0, 1] by 
3 elements to the right, yielding the sequence [1,0,0, 0]. 

Another way to prevent unsatisfied data dependencies is to shift the first 
tick on which a frequency labeled actor must fire. We propose to add a phase to 
each of these actors, giving the offset from the first tick at which it must fire. 
With the semantic formalized in the next section, that phase is constrained in 
order to have a periodic global clock. Figure 2(b) takes into account the marking 
and phase denoted (ii) in Fig. 1. With the rational marking, the dependencies 
between the radar and fusion firings are now satisfied, and with the phase on 
the display actor, the dependencies between the camera and display firings are 
also satisfied. 


3 Formalization of the Polygraph Model 


We denote by B the set {0,1}, by Z the set of integers, by N = {n € Z|n > 0} 
the set of natural integers, and by Q the set of rational numbers. For any set S, 
the free semigroup on S is denoted ST. 


System graph. A system graph is a structure used to represent the topology of 
the communications. Formally, it is a connected finite directed graph G = (V, £) 
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with set of vertices V and set of edges Æ C V x V such that V is the set 
of actors and F is the set of channels. We use an index notation to identify 
elements with respect to a given actor or channel, considering that E and V 
are sets indexed respectively in {1,--- ,|E|} and {1,---,|V|}. We denote vi 
(resp. ej) the actor (resp. channel) of index 7 (resp. j). For an actor v € V, 
let in(v) = {(v',v) € E|vu’ € V} denote the set of input channels of v and 
out(v) = {(v,v’) € E|v' € V} the set of output channels of v. 


Topology matrix and channel states. As for SDF and its derivations [3,14], the 
communication rates are defined by a topology matrix with one row per channel 
and one column per actor. The only difference in this definition is that we rely 
on rational numbers. The absolute value of a rate in the matrix defines how 
many tokens are produced or consumed per firing of the corresponding actor 
on the corresponding channel, and the sign of that rate indicates if the tokens 
are produced (positive rate) or consumed (negative rate). For a given actor and 
channel, the rate must be 0 if the actor is not connected to the channel, or if the 
actor is connected to both ends of the channel. 


Definition 1 (Topology matrix). A matriz T = (qij) € QIZIXIVI is a topol- 
ogy matrix of a system graph G if for every channel e; = (vj, Un) € E we have: 


= ya = 0 for all l 4 j,k; 

- if j #k, then Jij > 0 and Yip < 0 are irreducible fractions, and at most one 
of them has a denominator greater than 1; 

— if j =k, then yj; = 0. 


We also use a rational number per channel to track the communication state 
of the system during an execution. A channel state is a vector with one row per 
channel. Each coordinate in the vector tracks the respective number of firings 
of the connected actors, by addition of their rates when they fire, and that 
coordinate rounded down is the number of tokens in the channel. 


Definition 2 (Channel state). A vector c € Q'”!*! is a channel state of a 
system graph G with topology matrix T if for every channel e; = (vj, vk) € E, 
the denominator of ci is the maximum between the denominators of yi; and Yik, 
and |c;| is the number of tokens in the channel. We denote C C QIEIX! the set 
of all these possible states. 


Timed actors and global clock. A subset Vr C V of timed actors are constrained 
by a frequency, expressed as a strictly positive natural number. We use a fre- 
quency mapping w : Vr — N° in order to map the timed actors to their 
frequency. There is an implicit system time unit, and each timed actor v; € Vr 
is supposed to be fired exactly w; := w(v;) times per system time unit. In order 
to have a minimal system time unit, we consider that the greatest common divi- 
sor of all the frequencies is gcd(w[Vr]) = 1. This is not limiting, since any set of 
frequencies and system time unit can be adjusted to fit this constraint. 

In addition, the timed actors must fire synchronously with respect to a global 
clock. The resolution of that global clock is a sufficient number of ticks per system 
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time unit to associate to each tick the set of timed actors that must fire at the 
corresponding date. For this, we consider the ticks 0,1,...,7—1 per system 
time unit, where m is the least common multiple of all the actor frequencies 
mw = Iem({w;|v; E€ Vr}). Note that if Vr is empty, 7 = 1, and the global clock 
does not constrain the firing of any actor. 

Given a timed actor v; € Vr, there should be w; out of m ticks associated 
with that actor’s firings. To reflect the periodic nature of the firing of timed 
actors, for a timed actor v; of period p; = 7/w;, it fires every p;-th tick. 

As mentioned in Sect. 2, all the timed actors have a phase. We use a phase 
mapping y : Vr —> N to map the timed actors to their phase. The first firing 
of each timed actor v; € Vr occurs at the tick y; := (vi). The only con- 
straint to respect the expected frequency of the firings is that Vu; € Vr we have 
0< gi < T/wi. 


Definition 3 (Global clock, firing ticks). For a system graph G with fre- 
quency mapping w, resolution 7, and phase mapping y, the global clock is a set 
T = {0,1,..., 7—1} and for each timed actor vi E€ Vp there is a subset of firing 
ticks T; = {7 € T|7 = y; (mod 7/u;)}. 


Polygraphs. We now define the notion of polygraph which introduces a basic 
communication topology, a topology matrix, a frequency and phase mapping for 
all timed actors, and an initial marking of the graph. 


Definition 4 (Polygraph, initial marking). A polygraph is a tuple P = 
(G,T,w, p,m) where G is a system graph, T is a topology matrix, w is a frequency 
mapping, p is a phase mapping and m € C is an initial marking such that 
Ve; € E we have m; > 0. 


In the following, we consider that a polygraph P = (G,T,w,y,m) is given, 
with its global clock T and sets of firing ticks T; for all the timed actors v; € Vr. 


States and transitions. The state of a polygraph is composed of a channel state, 
the current tick of the global clock, and a vector with one row per actor used 
to track the number of firings of the timed actors since the last change in the 
current tick. This tracking vector is used to check that the timed actors respect 
their synchronous firing constraints. 


Definition 5 (State). A state of a polygraph P is a tuple s = (c,T,a) where 
c € C is a channel state, r € T is a tick, anda € N'Y !*! is a tracking vector. 
We denote S C C x T x NIVI*! the set of all possible states for P. 


The effect of the firing of an actor on the channel state is to add its rates to 
the respective coordinate of all the channels. For an actor v;, the i-th column 
of T gives all the rates per channel. Therefore, to extract that column from the 
matrix for each actor v; € V, we use a unitary firing vector u € BIY|*!, such 
that u; = 1, and for all j 4 i we have u; = 0. We denote U C IVIX1 the set 
of these vectors, and for convenience we denote the unitary activation vector of 
actor uv; by u;. With the unitary firing vector of any actor v;, the product Tu; 
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gives a vector holding for each channel e; the rate of v; on ej. For any channel 
state c, the channel state after the atomic firing of v; is then c +Tul. Also, 
the firing of a timed actor is tracked by adding its unitary firing vector to the 
tracking vector. The firing of an actor has no effect on the current tick. 


Definition 6 (Fire). For a polygraph P, the mapping fire : U x S — S maps 
a unitary activation vector u; and a state s = (c,T,a) to the state s' = (c’,T’,a’) 
such that we have c' = c+Tu;, 7’ = 7, and if v; € Vp thena’ = a+u;, otherwise 
a’ =a. 


Remark 1. For two consecutive firings of any actors v; and vj from a state s = 
(c,7,a), the resulting state s” = (c”,7”,a”) does not depend on the order of 
the firings, and c” = c+ T (u; + u;). This property can be generalized to any 
finite number of consecutive firings. 


The other possible transition between two states occurs when the global clock 
ticks. When the global clock ticks, the channel state is not changed, the current 
tick is adjusted, and the tracking vector is reset. 


Definition 7 (Tick). For a polygraph P, the mapping tick : S — S maps 
a state s = (c,7,a) to the state s' = (c’,7’,a’) such that we have c’ = c, 
7’ = (7 +1) mod 7, and a’ = 0. 


Executions. The state of P can evolve by successive application of either fire or 
tick. An execution of P is a sequence of such applications starting from a state 
sı € S and leading to states e = s1---S, E ST. However, with the frequency 
constraints, there are some conditions for the applications. 

Consider the firing fire(u;,s) of a timed actor v; in a state s = (c,7,a). In 
this case, v; may fire only if the current tick 7 is one of its firing ticks, i.e. 7 € Ty. 
Since it must fire exactly once on such a tick, an additional constraint to fire a 
timed actor v; is that it has not fired yet, i.e. its coordinate in the tracking vector 
a is a; = 0. To capture this constraint, we define a tick firing vector t7 € BIV!*1 
for each tick 7 € T, in which a coordinate is set to one if the corresponding 
actor is expected to fire at tick 7. More formally, for any v; € V \ Vr we have 
tī = 0, and for any vj E€ Vr we have t} = 1 if 7 € T;, and t} = 0 otherwise. The 
constraint to fire v; € Vr in a state with current tick 7 and tracking vector a is 
then a; < t}. 

The clock update tick(s) in a state s = (c,7,a) is also subject to a constraint: 
the timed actors that were supposed to fire synchronously with the current tick 
have done so exactly once, i.e. a= t7. 


Definition 8 (Synchronous execution). An execution e = sı -Sn E S* of 
a polygraph P is synchronous if Y1 < k < n, we have sk = (c,T,a) such that: 


— either 8,41 = fire(u;, sk) for some vi E€ V, and in addition, if vi E€ Vr, then 
Qi < ti, 
— or Sk4}1 = tick(s;,), and in addition, a = t”. 
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Until now, we considered executions of a polygraph where the order of the 
firings is constrained only by the frequencies. However, for an actor to fire, there 
must be enough tokens on its input channels, or its rational communication rate 
must allow firings consuming 0 tokens. In order to fire an actor v; in a state 
s = (c,T,a), we require that for each input channel ej of v;, since the rate yj; is 
negative, the channel state c; must be large enough to avoid reaching a negative 
state, i.e. cj + Yi > 0, or equivalently c; > |yjil- This constraint requires an 
ordering of the actor firings such that a producer is fired a sufficient number of 
times for a consumer to be able to fire in turn. 


Definition 9 (Non-blocking execution). An execution e = sı -Sn E€ St of 
a polygraph P is non-blocking if V1 < k < n, we have sp = (c,T,a) such that: 


— either sp41 = fire(u;, sk) for some vi E€ V, and in addition, Ve; € in(v), 
cj > | wal, 
— or 8K41 = tick( sp). 


Consistency property. If verified, the consistency property of P guarantees that 
it is possible to build a synchronous execution e = s1--:8, E S* such that 
sı = (m,0,0) and sı = sn. Such an execution is called a consistent execution 
of P, and can obviously be repeated an indefinite number of times to build a 
consistent execution of arbitrary length. [14, Theorem 1] states that a necessary 
and sufficient condition for a given SDF graph to be consistent is that there is a 
non-trivial solution x to Px = 0. 

To extend this result to polygraphs, as explained in the previous section, we 
need to take into account the frequencies of the timed actors. In other words, we 
need to make sure that it is possible to have a synchronous execution with 2; 
firings per actor vi. The additional constraint due to the frequencies is that the 
number of firings x; of all the timed actors v; corresponds to a number r € N of 
repetitions of the global clock period. 

To state the conditions for a polygraph to be consistent, we thus want to 
separate the number of firings of the timed actors from the others. We define the 
vector t = Fyrer t7 giving for each timed actor v; the number t; of expected 
firings per period of the global clock. We then define the set Y c NIVI”! of 
vectors y such that we have a number of firings y; 4 0 only for v; € V \ Vr. 


Theorem 1. A polygraph P has a consistent execution if and only if there exists 
a non-trivial solution x € NIY!*! to Tx = 0 such that x = y+rt for some y € Y 
andr € N. Any such solution is called a repetition vector of P. Moreover, there 
exists a minimal repetition vector x such that for any other repetition vector x’ 
we have x' = kx for some k EN. 


Sketch of proof. First, we prove that the condition is sufficient, and suppose that 
there exists such a solution x. Then we can decompose: 


xay+(t°+...4+¢7 4+...+0°+...4+t7-4) 
a, ee 


=rt 
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The required consistent execution can be obtained by constructing sub- 
executions corresponding to this decomposition, relying on Definition’ and 
Remark 1. 


Claim (1). There exists a synchronous execution e; € S+ with starting state 
s = (m,0,0) and ending state s’ = (m+Ty,0,0). 


The execution e; is constructed by applying y; firings of each actor v; € V \ Vr 
(in any order). Since the fired actors are not timed actors, any such sequence is 
synchronous. The resulting channel state is m + Ty as per Remark 1. 


Claim (2). For any starting state s = (c,7,0), there exists a synchronous execu- 
tion e2 € S* starting from s with ending state s’ = (c + T't7,(7 + 1) mod 7,0). 


The execution ez for T is constructed by firing exactly once each timed actor 
supposed to do so at tick 7, and then applying the tick mapping. 


Claim (8). For any starting state s = (c,0,0), there exists a synchronous exe- 
cution e3 € St starting from s with ending state s’ = (c +Tt,0,0). 


The execution e3 is obtained by successively executing ez for T = 0,...,7 — 1. 


Claim (4). There exists a synchronous execution e, € S* with starting state 
s = (m,0,0) and ending state s’ = (m +T (y + rt), 0,0). 


The sequence e4 is constructed by executing e1, followed by e3 repeated r times. 
Hence, given that [x = 0 and x = y+ rt, it can be easily checked that the ending 
state of e4 is the same as its starting state, and e4 is consistent. The fact that 
the condition is also necessary follows from the definitions. Since the current tick 
must return to 0 after a consistent execution, such an execution must perform a 
number r of periods of the global clock for some r € N, in other words it must 
contain rr applications of the tick mapping and rt; firings of each timed actor 
vi. The existence of a minimal solution immediately follows from the fact that 
in this case rank(T) = |V| — 1 according to [14, Corollary of Lemma 2]. 
Due to lack of space, a detailed proof is left to the reader. 


Liveness property. If verified, the liveness property of P guarantees that it is 
possible to build a consistent execution e = s,---S, € St such that e is also a 
non-blocking execution. Such an execution e is called a live execution. 

In a way similar to [14, Theorem 3], we define the notion of a scheduler 
building only synchronous and non-blocking executions. Our goal is to show that 
P has a live execution if and only if any such scheduler can build a consistent 
execution. 

From now on, we consider that P is consistent with minimal repetition vector 
x. We define the mapping count : V x St — N that given an actor v; and an 
execution e = s,---s, E S* returns the number of firings of v; in e, i.e. the 
number of k such that 1 < k < n and sk41 = fire(uj, sk). Notice that since a live 
execution e of P is also consistent, by definition we have Vu; € V, count(v;, e) = 
zi. Also, we say that an actor v; € V is runnable after an execution e € St 
with ending state s if count(v;,e) < x; and the one-step execution ss’ € St with 
s’ = fire(u;, s) is synchronous and non-blocking. 


A Data Flow Model with Frequency Arithmetic 381 


Definition 10 (Scheduler). A scheduler of P is a mapping o : St — St 
that maps an execution e = s1 ---Sn E St to an execution e' € S* such that if 
we denote Sn = (c, T,a) we have: 


— either e' = 81:-+ Sps! E S* with s! = fire(u;, Sn) for some actor v; runnable 
after e; 

- or e! = 51+: Sns! E SH with s' = tick(s,) and a = t7; 

- ore’ =e if there is no runnable actor after e and a # t7. 


An execution defined by a scheduler ø is the fixed point constructed by 
recursive application! of o starting from an initial execution e = ( (m, 0, 0)). 


Theorem 2. Let P be a consistent polygraph with minimal repetition vector x, 
o a scheduler of P, and e the execution defined by o. Then P has a live execution 
if and only if Vu; € V, count (v;, e) = zi. 


Sketch of proof. The condition is obviously sufficient. The proof that it is also 
necessary can be easily made by induction. If e is a live execution and e’ is a 
synchronous and non-blocking execution constructed by ø so far, with |e’| < |el, 
we can show that e’ can be extended by one more step (e.g. by taking the first 
step present in e but not in e’, since its preconditions are necessarily satisfied). 


4 Tool Support for Liveness Checking 


DIVERSITY is a customizable model analysis tool based on symbolic execution, 
available in the Eclipse Formal Modeling Project [17]. DIVERSITY provides a 
pivot language called «LIA (eXecutable Language for Interaction and Archi- 
tecture) introducing a set of communication and execution primitives allowing 
one to encode a wide class of dynamic model semantics [2,9], Communicating 
STS [1], and abstractions of hybrid systems [15]. In this work, we use it to ana- 
lyze Polygraph models, to check their liveness in a similar way to that defined 
by a scheduler as per Definition 10. 

The root entity in an xLIA model is a so-called system. A system is an 
executable entity that can be atomic (state-machine) or compositional or hier- 
archical. A Polygraph model translated to xLIA is a system where the actors are 
state-machines with input/output ports associated with the ends of the channels. 
They communicate asynchronously over FIFO queues, bounded or not, using 
xLIA connectors. Variables are used to store received tokens on input instruc- 
tions in transitions, with guards conditioning their firing, and output statements 
to model their token productions. 

Figure3 represents such a state machine for any actor of the polygraph in 
Fig. 1. Each transition is labeled with xLIA macros representing the actions per- 
formed. The init macro moves the initial marking from the input queues to the 


1 Hence, a scheduler can be also defined as a partial mapping on o*((m, 0, 0)). 
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Fig. 3. xLIA state machine pattern for an actor of a polygraph 


counter of available input tokens, canFire() tests if enough tokens are present 
for a non-blocking firing, consumption decrements the counter of available input 
tokens, production sends the production rate on the successor’s queue, and recep- 
tion reads that rate and adds it to the number of available tokens. Regarding 
state machine semantics, all the states are pseudo-states, except idle which is 
stable. This means that any fired transition must be completed until returning 
to the idle state. The else transition will be evaluated if there is no possible 
reception. 

The xLIA language allows a fine-grained definition of an execution model for 
the actors of a polygraph. Some instructions associate a sequence of actors to 
fire with each tick of a clock. When attempting to fire a timed actor, only one 
firing is triggered if possible, and when attempting the same for other actors, as 
many firings as possible are triggered. Hence, the timed actors are only fired at 
the expected tick, and cause a deadlock result if it’s not possible. For the other 
actors, a counter limits their number of firings to their coordinate in the minimal 
repetition vector, as required by Theorem 2. With this setup, for a polygraph P 
with minimal repetition vector x = y + rt, the length of a live execution path 
is rz, plus one for the initialization step handling the initial marking. Any path 
with less steps leads to a deadlock. 

We tested this technique using DIVERSITY on an Intel core i7. For the poly- 
graph of Fig. 1 with initial marking (ii), the tool finds that the liveness property 
is verified. We also tested the initial marking (i), and the tool correctly identified 
a deadlock in less than 200 ms. This example is extracted from a more complex 
polygraph modeling an Advanced Driver-Assistance System (ADAS), that we 
also used to evaluate the liveness checking tool. The considered polygraph has 
18 actors (5 of which are timed actors), 32 channels (6 of which have an initial 
marking), where 10 actors have rational communication rates. For a correctly 
marked model, we find a live execution sequence in 4s. 


5 Discussion and Related Work 


In [16], an extension to SDF is proposed to add a single throughput constraint on 
a channel of a consistent graph. From this constraint, a firing frequency is derived 
for the actors by transitivity. This approach, while preserving the consistency 
property by construction, does not allow the expression of a frequency constraint 


A Data Flow Model with Frequency Arithmetic 383 


per actor, based on a real-life constraint on the modeled component, nor the 
explicit synchronization of the firings on a reference time scale. 

The programming model PTIDES [18] combines a real-time semantic for 
sensors and actuators, and a discrete event semantic for other components like 
computation kernels. These other components have an awareness of the real time 
through a logical time abstraction. The resulting execution semantic has simi- 
larities with Polygraph, since some components are constrained by real-time and 
others only react to their stimuli. The semantic of PTIDES is much more flex- 
ible than Polygraph, since it does not require fixed production or consumption 
rates. On the other hand, and as opposed to Polygraph, there is no way to derive 
a consistent and live periodic schedule in PTIDES, which makes static perfor- 
mance prediction more difficult. Nevertheless, since the semantics are similar, 
we believe that the notion of logical time as defined in PTIDES is applicable to 
practical distributed implementations of polygraphs. 

Synchronous programming languages [7,8] can be used to express a data flow 
between synchronous periodic nodes, in order to generate correct-by-construction 
programs. In these approaches, all the nodes are synchronous, while in Poly- 
graph, some actors fire asynchronously when enabled. Also, the goal of our app- 
roach is to be able to reason formally on the modeled systems, and automate as 
many tasks as possible in its design, implementation and validation. Such a task 
could be the association of the asynchronous firings to ticks of the global clock, 
and the generation of a synchronous program for automatic code generation. 

Recently published research [6] follows a similar approach to ours. By mixing 
elements from two existing formalisms, one allowing the specification of time- 
triggered tasks and the other the specification of data flow actors, the expressive- 
ness of the resulting modeling framework is comparable to that of Polygraph. The 
main difference is that Polygraph is a single formalism with decidable properties 
and algorithms to check them in practice. In [6], the impact of the combination 
of constraints from two different formalisms on their respective properties is not 
discussed, as the proposed approach is more focused on the performance evalua- 
tion. The experimental results the authors obtained are in favor of the modeling 
approach we have in common. 


6 Conclusion 


We have introduced Polygraph, a data flow formalism extending SDF with syn- 
chronous firing semantics for the actors. We have shown that with this extension, 
the existing conditions to decide of a given SDF graph’s consistency and liveness 
were no longer sufficient. We have extended the corresponding theorems and 
shown that the expressiveness extensions we proposed do not impact the decid- 
ability of these properties. Finally, as a first step towards tool assisted modeling 
of polygraphs, we have introduced a framework relying on DIVERSITY to verify 
their liveness. 

Our next step is to further extend Polygraph to add flexibility in the exe- 
cution semantic, with the same objective to preserve the capability to perform 
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accurate static analysis of a system’s performance. Still, with this first extension, 
there are already interesting research perspectives regarding the applicability of 
existing static performance analysis techniques, and their potential extensions 
to take into account the specifics of a polygraph’s scheduling. 
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foundation for the software methodology in the project. 
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Abstract. Testing is a widely used method to assess software quality. 
Coverage criteria and coverage measurements are used to ensure that 
the constructed test suites adequately test the given software. Since 
manually developing such test suites is too expensive in practice, various 
automatic test-generation approaches were proposed. Since all approaches 
come with different strengths, combinations are necessary in order to 
achieve stronger tools. We study cooperative combinations of verification 
approaches for test generation, with high-level information exchange. 

We present CoVeriTest, a hybrid approach for test-case generation, which 
iteratively applies different conditional model checkers. Thereby, it allows 
to adjust the level of cooperation and to assign individual time budgets 
per verifier. In our experiments, we combine explicit-state model checking 
and predicate abstraction (from CPAcuecxer) to systematically study 
different CoVeriTesr configurations. Moreover, CoVeriTest achieves higher 
coverage than state-of-the-art test-generation tools for some programs. 


Keywords: Test-case generation - Software testing - Test coverage - 
Conditional model checking - Cooperative verification - Model checking 


1 Introduction 


Testing is a commonly used technique to measure the quality of software. Since 
manually creating such test suites is laborious, automatic techniques are used: e.g., 
model-based techniques for black-box testing and techniques based on control-flow 
coverage for white-box testing. Many automatic techniques have been proposed, 
ranging from random testing [36,57] and fuzzing [26,52,53], over search-based 
testing [55] to symbolic execution [23, 24, 58] and reachability analyses [5, 12, 45, 46]. 
The latter are well-suited to find bugs and derive test suites that achieve high 
coverage, and several verification tools support test generation (e.g., BLast [5], 
PATHFINDER [61], CPACHECKER [12]). The reachability checks for all test goals seem 
too expensive, but in practice, those approaches can be made pretty efficient. 
Encouraged by tremendous advances in software verification [3] and a recent 
case study that compared model checkers with test tools w.r.t. bug finding [17], 
we study a new kind of combination of reachability analyses for test generation. 
Combinations are necessary because different analysis techniques have different 
strength and weaknesses. For example, consider function foo in Listing 1. Explicit 
state model checking [18,33] tracks the values of variables i and s and easily 
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detects the reachability of the 


statements in the outermost if 0 void foo(int i, int n) { 

branch (lines 3-6), while it has 1 init l s=0; 

difficulties with the complex 2 i f(i : 0) 

condition in the else-branch 3 while (i ==0) { 

(line 8). In contrast, predicate 4 if (s== ) Lit (Q; 
abstraction [33,39] can easily 5 if (s==1) i = exec (); 

derive test values for the complex © s—(s+1)%2; 

condition in line 8, but to handle T ; : ; 

the if branch (lines 3-6) it must ; ; else if(2*i<n && i>0) exec (); 


spent effort on the detection 
of the predicates s=0, s=1, 
and į = 0. Independently of each 
other, test approaches [1,34,47,54] and verification approaches [9,10,29,37] 
employ combinations to tackle such problems. However, there are no approaches 
yet that combine different reachability analyses for test generation. 

Inspired by abstraction-driven concolic testing [32], which interleaves concolic 
execution and predicate abstraction, we propose CoVERITEsT, which stands 
for cooperative verifier-based testing. CoVERITEsT iteratively executes a given 
sequence of reachability analyses. In each iteration, the analyses are run in 
sequence and each analysis is limited by its individual, but configurable time limit. 
Furthermore, CoVeERITEst allows the analysis to share various types of analysis 
information, e.g., which paths are infeasible, have already been explored, or which 
abstraction level to use. To get access to a large set of reachability analyses, 
we implemented CoVeriTest in the configurable software-analysis framework 
CPACHECKER [15]. We used our implementation to evaluate different CoVERITEST 
configurations on a large set of well-established benchmark programs and to com- 
pare CoVERITEsT with existing state-of-the-art test-generation techniques. Our 
experiments confirm that reachability analyses are valuable for test generation. 
Contributions. In summary, we make the following contributions: 


Fig. 1. Example program foo 


e We introduce CoVeERITEsT, a flexible approach for high-level interleaving of 
reachability analyses with information exchange for test generation. 

e We perform an extensive evaluation of CoVERITEstT studying 54 different 
configurations and two state-of-the-art test-generation tools!. 

e CoVeriTest and all our experimental data are publically available? [13]. 


2 Testing with Verifiers 


The basic idea behind testing with verifiers is to derive test cases from counter- 
examples [5,61]. Thus, meeting a test goal during verification has to trigger a 
specification violation. First, we remind the reader of some basic notations. 


1 We choose the best two tools VeriFuzz and Kier from the international competition 
on software testing (Test-Comp 2019) [4]. https: //test-comp.sosy-lab.org/2019/ 
? https: //www.sosy-lab.org/research/coop-testgen / 
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Programs. Following literature [9], we represent programs by control-flow 
automata (CFAs). A CFA P = (L, 4o, G) consists of a set L of program locations 
(the program-counter values), an initial program location f) € L, and a set of 
control-flow edges G C Lx Ops~x L. The set Ops describes all possible operations, 
e.g., assume statements (resulting from conditions in if or while statements) and 
assignments. For the program semantics, we rely on an operational semantics, 
which we do not further specify. 

Abstract Reachability Graph (ARG). ARGs record the work done by reach- 
ability analyses. An ARG is constructed for a program P = (L, lo, G) and stores 
(a) the abstract state space that has been explored so far, (b) which abstract states 
must still be explored, and (c) what abstraction level (tracked variables, considered 
predicates, etc.) is used. Technically, an ARG is a five-tuple (N, succ, root, F,7) 
that consists of aset N of abstract states, a special node root € N that represents 
the initial states of program P, a relation succ C N x G x N that records already 
explored successor relations, a set F C N of frontier nodes, which remembers 
all nodes that have not been fully explored, and a precision m describing the 
abstraction level. Every ARG must ensure that a node n is either contained in F 
or completely explored, i.e., all abstract successors have been explored. We use 
ARGs for information exchange between reachability analyses. 

Test Goals. In this paper, we are interested 

in structural coverage, e.g., branch coverage. € goals 
Transferred to our notion of programs, this 9 ¢ goals @) d 
means that our test goals are a subset of the Fig. 2. Encoding test goals as speci- 
program’s control-flow edges. For using a feation violation 

verifier to generate tests, we have to encode 

the test goals as a specification violation. Figure 2 shows a possible encoding, 
which uses a protocol automaton. Whenever a test goal is executed, the automaton 
transits from the initial, safe state qo to the accepting state qe, which marks a 
property violation. Note that reachability analyses, which we consider for test 
generation, can easily monitor such specifications during exploration. 

Now, we have everything at hand to describe how reachability analyses 
generate tests. Algorithm 1 shows the test-generation process. The algorithm gets 
as input a program, a set of test goals, and a time limit for test generation. For 
cooperative test generation, we need to guide state-space explorations. To this 
end, we also provide an initial ARG and a condition. A condition is a concept 
known from conditional model checking [10] and describes which parts of the state 
space have already been explored by other verifiers. A verifier, e.g., a reachability 
analysis, can use a condition to ignore the already explored parts of the state 
space. Verifiers that do not understand conditions can safely ignore them. 

At the beginning, Alg. 1 sets up the data structures for the test suite and the 
set of covered goals. To set up the specification, it follows the idea of Fig. 2. As 
long as not all test goals are covered, there exist abstract states that must be 
explored, and the time limit has not elapsed, the algorithm tries to generate new 
tests. Therefore, it resumes the exploration of the current ARG [5] taking into 
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Algorithm 1. Generating tests with a (conditional) reachability analysis 


Input: prog = (L, lo, G), goals C G, limit € N, arg =(N,succ, root, F, 7), 
condition w 
Output: generated test_suite, covered goals, updated arg 


1: test_suite—Q; covered=ģ; 
2: p=generate_specification(goals); 


3: while (goals 4 and arg.F Æ and elapsed_time<limit) do 
4: arg = explore(prog, y, arg, Y, limit — elapsed_time); 


5: if (arg.F # 0 and elapsed_time<limit) then 


6: T = extract_counterexample_trace(arg); 

T: test_suite = test_suite U generate_test_from_trace(T); 

8: goals = goals\{last_edge(r)}; covered = covered U {last_edge(7)} 
9: p=generate_specification(goals); 


10: return (test_suite, covered, arg); 


account program prog, specification y, and (if understood) the condition w. 
If the exploration stops, then it returns an updated ARG. Exploration stops 
due to one of three reasons: (1) the state space is explored completely (F = 9), 
(2) the time limit is reached, or (3) a counterexample has been found.® In the 
latter case, a new test is generated. First, a counterexample trace is extracted 
from the ARG. The trace describes a path through the ARG that starts at the 
root and its last edge is a test goal (the reason for the specification violation). 
Next, a test is constructed from the path and added to the test suite. Basically, 
the path is converted into a formula and a satisfying assignment* is used as 
the test case. For the details, we refer the reader to the work that defined the 
method [5]. Additionally, the covered goal (last edge on the counterexample path) 
is removed from the set of open test goals and added to the set of covered goals. 
Finally, the specification is updated to no longer consider the covered goal. When 
the algorithm finishes, it returns the generated test suite, the set of covered goals 
and the last ARG considered. The ARG is returned to enable cooperation. 


3 COoOVERITEST 


The previous section described how to use a single reachability analysis to pro- 
duce tests for covering a set of test goals. Due to different strengths and weak- 
nesses, some test goals are harder to cover for one analysis than for another. To 


3 We assume that an exploration is only complete if no counterexample exists. 
4 We assume that only feasible counterexamples are contained and infeasible counter- 
examples were eliminated by the reachability analysis during exploration. 
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Algorithm 2. CoVeriTsst: alternating reachability analyses to generate tests 
j+ 


Input: prog = (L, 40, G), goals C G, total_limit € N, configs € (analysis x N 
Output: test_suite 

1: test_suite=@; args=(); current=0; 

2: while (goals 4 ý and elapsed_time<total_limit) do 

3: analysis = configs[current].first; limit = configs|current].second; 


4: (arg,W) = cooperateAndInit(prog, args, configs.length); 


5: (tests, covered, arg) = analysis(prog, goals, limit, arg, w); 

6: test_suite=test_suite U tests; goals=goals\covered; args=args o(arg); 
7: if (arg.F=0) then 

8: return test_suite; 


9: current = (current+1) % configs.length; 


10: return test_suite; 


maximize the number of covered goals, different analyses should be combined. In 
CoVERITEST, we rotate analyses for test generation. Thus, we avoid that analyses 
try to cover the same goal in parallel and we do not need to know in advance 
which analysis can cover which goals. Moreover, analyses that get stuck trying to 
cover goals that other analyses handle later, get a chance to recover. Additionally, 
CoVERITEST supports cooperation among analyses. More concrete: analyses may 
extract and use information from ARGs constructed by previous analysis runs. 

Algorithm 2 describes the CoVERITEsT workflow. It gets four inputs. Program, 
test goals, and time limit are already known from Alg.1 (test generation with 
a single analysis). Additionally, CoVerRITEstT gets a sequence of configurations, 
namely pairs of reachability analysis and time limit. The time limit accompanied 
with the analysis restricts the runtime of the respective analysis per call (see 
line 5). In contrast to Alg. 1, CoVERITEst does not get an ARG or condition. To 
enable cooperation between analyses, CoVERITEST constructs these two elements 
individually for each analysis run. During construction, it may extract and use 
information from results of previous analysis runs. 

After initializing the test suite and the data structure to store analysis 
results (args), COVERITEsT repeatedly iterates over the configurations. It starts 
with the first pair in the sequence and finishes iterating when its time limit 
exceeded or all goals are covered. In each iteration, COVERITEsT first extracts the 
analysis to execute and its accompanied time limit (line 3). Then, it constructs 
the remaining inputs of the analysis: ARG and condition. Details regarding the 
construction are explained later in Alg. 3. Next, CoVERITEST executes the current 
analysis with the given program, the remaining test goals, the accompanied time 
limit, and the constructed ARG and condition. When the analysis has finished, 
CoVeriTest adds the returned tests to its test suite, removes all test goals 
covered by the analysis run from the set of goals, and stores the analysis result for 
cooperation (concatenates arg to the sequence of ARGs). If the analysis finished 
its exploration (arg.F=@), any remaining test goal should be unreachable and 
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Algorithm 3. cooperateAndInit: set up start point for analysis exploration, 
possibly transferring knowledge from previous analysis runs 
y+ 


Input: prog = (L, 4o, G), args € (arg), numAnalyses € N 
Output: ARG for program prog, condition describing explored state space 
1: p=false; m=; root = (lo, T); 
2: if (length(args)>numAnalyses) then 
3 if (reuse-arg) then 
4 return (last arg of_analysis(numAnalyses, args), 7); 
5: if (reuse-precision) then 
6: m = last arg of analysis(numAnalyses, args).7; 
7: if (use-condition ^ length(args)>0) then 
8 Ņ = extract_ condition(args[length(args)-1]); 
9: return (({root}, Ø, root, {root}, 7), Y); 


CoVERITEST returns its test suite. Otherwise, CoVERITEsT determines how to 
continue in the next iteration (i.e., which configuration to consider). At the end 
of all iterations, CoVeRITEsT returns its generated test suite. 

Next, we explain how to construct the ARG and the condition input for 
an analysis. The ARG describes the level of abstraction and where to con- 
tinue exploration while the condition describes which parts of the state space 
have already been explored. Both guide the exploration of an analysis, which 
makes them well-suited for cooperation. While there are plenty of possibilities for 
cooperation, we currently only support three basic options: continue exploration 
of the previous ARG of the analysis (reuse-arg), reuse the analysis’ abstraction 
level (reuse-precision), and restrict the exploration to the state space left out 
by the previous analysis (use-condition). The first two options only ensure that 
an analysis does not loose too much information due to switching. The last option, 
which is inspired by abstraction-driven concolic execution [32], indeed realizes 
cooperation between different analyses. Note that the last two options can also 
be combined. If all options are turned off, no information will be exchanged. 

Algorithm 3 shows the cooperative initialization of ARG and condition dis- 
cussed above. It gets three inputs: the program, a sequence of args needed to 
realize cooperation, and the number of analyses used. At the beginning, it ini- 
tializes the ARG components and the condition assuming no cooperation should 
be done. The condition states that nothing has been explored, the abstraction 
level becomes the coarsest available, and the ARG root considers the start of all 
program executions (initial program location and arbitrary variable values). If 
no cooperation is configured or the ARG required for cooperation is not available 
(e.g., in the first round), the returned ARG and condition tell the analysis to 
explore the complete state space from scratch. In all other cases, the analysis 
will be guided by information obtained in previous iterations. Option reuse-arg 


5 In contrast, the options reuse-arg and use-conditions cannot be combined because 
they are incompatible. The existing ARG does not fit to the constructed condition. 
Since reuse-arg subsumes reuse-precision, a combination makes no sense. 
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looks up the last ARG of the analysis stored in args. Reuse-precision con- 
siders the same ARG as reuse-arg, but only provides the ARG’s precision m. For 
use-condition, a condition is constructed from the last ARG in args. For the 
details of the condition construction, we refer to conditional model checking [10]. 

Next, we study the effectiveness of different CoVeErRITEst configurations and 
compare CoVERITEsT with existing test-generation tools. 


4 Evaluation 


We systematically evaluate CoVerRITEst along the following claims: 

Claim 1. For analyses that discard their own results from previous iterations 
(i.e., reuse-arg and reuse-precision turned off), CoVERITEsT achieves higher 
coverage if switches between analyses happen rarely. Evaluation Plan: We look 
at CoVerITeEst configurations in which analyses discard their own, previous 
results and compare the number of covered test goals reported by configurations 
that only differ in the analyses’ time limits. 

Claim 2. For analyses that reuse knowledge from their own, previous exe- 
cution (i.e., reuse-arg or reuse-precision turned on), CoVeRITssT achieves 
higher coverage if favoring more powerful analyses. Evaluation Plan: We look at 
CoVErITEst configurations in which analyses reuse their own, previous knowledge 
and compare the number of covered test goals reported by configurations that 
only differ in the analyses’ time limits. 

Claim 3. CoVeriTest performs better if analyses reuse knowledge from their 
own, previous execution (i.e., reuse-arg or reuse-precision turned on). Eval- 
uation Plan: From all sets of CoVeRITEst configurations that only differ in the 
analyses’ time limits, we select the best and compare these. 

Claim 4. Interleaving multiple analyses with CoVeErITEst often achieves better 
results than using only one of the analyses for test generation. Evaluation Plan: 
We compare the number of covered goals reported by the best COVERITEST 
configuration with those numbers achieved when running only one analysis of 
the CoVerITEst configuration for the total time limit. 

Claim 5. Interleaving verifiers for test generation is often better than running 
them in parallel. Evaluation Plan: We compare the number of covered goals 
reported by the best CoVerITEst configuration with the number achieved when 
running all analyses of the CoVerITesrT configuration in parallel. 

Claim 6. CoVerRITEst complements existing test-generation tools. Evaluation 
Plan: We use the same infrastructure and resources as used by the International 
Competition on Software Testing (Test-Comp’19)° and let the best CoVERITEST 
configuration construct test suites. These test suites are executed by the Test- 
Comp’19 validator to measure the achieved branch coverage. Then, we compare 
the coverage achieved by CoVeriTest with the coverage of the best two 
test-generation tools from Test-Comp’19. 


6 https: //test-comp.sosy-lab.org/2019/ 
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4.1 Setup 


CoVERITEsT Configurations. We implemented CoVeErRITEst in the software 
analysis framework CPACHECKER [15]. Basically, we implemented Algs. 1, 2 and 
integrated Alg. 3 into Alg. 2. For condition construction, we reuse the code from 
conditional model checking [10]. For our experiments, we combine value [18] and 
predicate analysis [16]. Both have been used in cooperative verification [10, 11,21]. 

Value analysis. CPACHECKER’s value analysis [18] tracks the values of variables 
stored in its current precision explicitly while assuming that the remaining 
variables may have any possible value. It iteratively increases its precision, i.e., 
the variables to track, combining counterexample-guided abstraction [28] with 
path-prefix slicing [22], and refinement selection [21]. Value analysis is efficient 
if few variable values need to be tracked, but it may get stuck in loops or suffers 
from a large state space in case variables are assigned many different values. 

Predicate analysis. CPACHECKER’s predicate analysis uses predicate ab- 
straction with adjustable-block encoding (ABE) [16]. ABE is configured to abstract 
at loop heads and uses the strongest postcondition at all remaining locations. To 
compute the set of predicates—its precision—, it uses counterexample-guided ab- 
straction refinement [28] combined with lazy refinement [43] and interpolation [41]. 
While the predicate analysis is powerful and often summarizes loops easily, succes- 
sor computation may require expensive SMT solver calls. 

For both analyses, a CoVerRITEst configuration specifies how Alg. 3 reuses 
the ARGs returned by previous analysis runs to set up the initial ARG and 
condition. In our experiments, we consider the following types of reuses. 


plain Ignores all ARGs returned by previous analysis runs, i.e., reuse-arg, 
reuse-prec, and use-condition are turned off. 

cond,, The value analysis does not obtain information from previous ARGs and 
the predicate analysis is only steered by the condition extracted from the 
ARG returned by the previous value analysis. 

cond, The value analysis is steered by the condition extracted from the ARG 
returned by the previous run of the predicate analysis and the predicate 
analysis ignores all previous ARGs. 

condy p Value and predicate analysis are steered by the condition extracted from 
the last ARG returned, i.e., only use-condition turned on. 

reuse-prec In each round, each analysis resumes its precision from the previous 
round, but restarts exploration, i.e., only reuse-prec is turned on. 

reuse-arg In each round, each analysis continues to explore the ARG it returned 
in the previous round, i.e., only reuse-arg is turned on. 

cond,+r Similar to cond,, but additionally the value analysis continues to 
explore the ARG it returned in the previous round and the predicate analysis 
restarts exploration with its precision from the previous round. 

cond,+r Similar to condp, but additionally the value analysis restarts explo- 
ration with its precision from the previous round and the predicate analysis 
continues to explore the ARG it returned in the previous round. 

cond, p+r Like condy p, but additionally the value and predicate analysis reuse 
their previous precision, i.e., reuse-prec and use-condition are turned on. 
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Finally, we need to fix the time limit for each analysis. We want to find out 
whether switches between analyses are important to the CoVrERITEsT approach. 
Therefore, we chose four limits (10s, 50s, 100s, 250s) that are applied to both 
analyses and trigger switches often, sometimes, or rarely. Additionally, we want 
to study whether it is advantageous if the time CoVeriITgst spends in a round 
is not equally spread among the analyses. Thus, we come up with two additional 
time limit pairs: (20s, 80s) and (80s, 20s). 

We combine all nine reuse types with the six time limit pairs, which results 
in 54 CoVeriTesst configurations. All 54 configurations aim at generating tests 
to cover the assume edges of a program. 


Tools. For CoVeErITEst, we used the implementation in CPACHECKER 
version 29347. Moreover, we compare COVERITEST against the two best tools 
VeRIFuzz [26] and Kies [23] from Test-Comp’19 (in the versions submitted to 
Test-Comp’19"). The tool VeriFuzz is based on the evolutionary fuzzer AFL 
and uses verification techniques to compute initial input values and parameters 
for AFL. KLEE applies symbolic execution. To compare CoVERITEST against 
KLEE and VERIFuzz, we use the validator TBF TEST-SUITE VALIDATOR v1.28 to 
measure branch coverage. TBF TEST-SUITE VALIDATOR is based on gcov’. 


Programs. CoVERITEST, KLEE, and VERIFuzz produce tests for C programs. 
All three tools participated in TestComp’19. Thus, for comparison of the three 
tools, we consider all 1720 tasks of the TestComp’19 benchmark set! that 
support the branch-coverage property. Since we do not need to execute tests 
for the comparison of the different COVERITEST configurations, we evaluated 
them on a larger benchmark set, which contains all 6703 C programs from the 
well-established SV-benchmark set!! in the version tagged svcomp18. 


Computing Resources. We run our experiments on machines with 33 GB 
of memory and an Intel Xeon E3-1230 v5 CPU with 8 processing units and a 
frequency of 3.4GHz. The underlying operating system is Ubuntu 18.04 with 
Linux kernel 4.15. As in TestComp’19, for test generation we grant each run a 
maximum of 8 processing units, 15 min of CPU time, and 15 GB of memory, and 
for test-suite execution (required to compare against KLEE and VERIFuzz), the 
TBF TEST-SUITE VALIDATOR is granted 2 processing units, 3h of CPU time, and 
7 GB of memory per run. We use BENCHExEc [20] to enforce the limits of a run. 


Availability. Our experimental data are available online’? [13]. 


T https: //gitlab.com /sosy-lab/test-comp /archives-2019/tree/testcomp19/2019 
8 https://gitlab.com/sosy-lab/test-comp/archives-2019/blob/testcomp19/2019/ 
tbf-testsuite-validator.zip 
? https: //gcc.gnu.org/onlinedocs/gec/Gcov.html 
10 https: //github.com/sosy-lab/sv-benchmarks/tree/testcomp19 
11 https: //github.com /sosy-lab/sv- benchmarks 
12 https: //www.sosy-lab.org/research/coop-testgen/ 
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Fig. 3. Comparing relative coverage (number of covered goals divided by maximal 
number of covered goals) achieved by CoVeriTesr configurations with different time 
limits. All configurations let analyses discard their own knowledge gained in previous 
executions. 


4.2 Experiments 


Claim 1 (Reduce switching when discarding own results). Four types of 
reuse (namely, plain, cond,, cond,, and cond, p) let the analyses discard their own 
knowledge from their previous executions. For each of these types, we compare 
the coverage achieved by all six CoVeRITEsT configurations that use this typet’. 
More concrete, for all six CoVrrITsst configurations applying the same reuse 
type, we first compute for each program the maximum over the number of covered 
goals achieved by each of these six configurations for that program. Then, for 
each of the six CoVERITEsT configurations that use that reuse type, we divide 
the number of covered goals achieved for a program by the respective maximum 
computed. We call this measure relative coverage because the value is relative 
to the maximum and not the total number of goals. Figure3 shows box plots 
per reuse type. The box plots show the distribution of the relative coverage. The 
closer the bottom border of a box is to value one, the higher coverage is achieved. 
For all four reuse types, the fourth box plot has the bottom border closest to 
value one. Since the fourth box plot is a configuration that grants each analysis 
250s per round (highest limit considered, only three switches), the claim holds. 

Claim 2 (Favor powerful analysis when reusing own results). Five types 
of reuse (namely, reuse-prec, reuse-arg, cond,+r, cond,+r, and condy p+r) let 
analyses reuse knowledge from their own, previous execution. Similar to the 
previous claim, we compute for each of these types the relative coverage of 
all six configurations using this particular type of reuse. For each reuse type, 


13 Note that those six configurations only differ in the analyses’ time limits. 
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Fig. 4. Comparing relative coverage (number of covered goals divided by maximal 
number of covered goals) achieved by CoVeriTrst configurations when using different 
time limits and a fixed reuse type. All considered configurations let analyses reuse 
knowledge from their own, previous execution. 


Fig. 4 shows box plots of the distributions of the relative coverage. As before, a 
bottom border closer to value one reflects higher coverage. In all five cases, the last 
box plot has the bottom border closest to value one. The last box plots represent 
CoVerITEst configurations that grant the value analysis 20s and the predicate 
analysis 80s in each round. Since the predicate analysis, which gets more time per 
round, is more powerful than the value analysis, our claim is valid.'4 

Claim 3 (Better reuse own results). So far, we know how to configure 
time limits. Now, we want to find out how to reuse information from previous 
analysis runs. For each reuse type, we select from the six available configurations 
the configuration that performed best. Again, we use the relative coverage to 
compare the resulting nine configurations. Figure5 shows box plots of the 
distributions of the relative coverage. The first four box plots show configurations 
in which analyses discard their own results, while the last five box plots refer 
to configurations in which analyses reuse knowledge from their own, previous 
executions. Since the last five boxes are smaller than the first four and their 
bottom borders are closer to one, the last five configurations achieve higher 
coverage. Hence, our claim holds. Moreover, from Fig. 5 we conclude that it is 
best to reuse the ARG (although cond,+r and cond,++ are close by). 

Claim 4 (Interleave multiple analyses rather than use one of them). 
To evaluate whether CoVeriTrst benefits from interleaving, we compare 
CoVERITEsT against the analyses used by it. CoVERITEsT interleaves value and 
predicate analysis. Figure 6(a) and 6(b) show scatter plots that compare for each 
program the coverage, i.e., number of covered goals divided by number of total 
goals, achieved by the best CoVERITEst configuration (x-axis) with the coverage 
achieved when only using either value or predicate analysis for test generation. 
Note that we excluded those programs from the scatter plots, for which we miss 


14 This insight is independently partially backed by a sequential combination of explicit- 
value analysis and predicate analysis that performed well in SV-COMP 2013 [62]. 
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Fig. 5. Comparing relative coverage achieved by CoVeriTerst configurations applying 
different strategies to reuse information gained by previous verifier runs. 
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Fig. 6. Compares the coverage achieved by CoVeriTesr (best configuration) with the 
coverage achieved when running CoVeriTest’s analyses alone or in parallel 


the number of covered goals for at least one test generator, e.g., due to timeout of 
the analysis. Figure 6(a) compares CoVERITEsT and value analysis; we see that 
almost all points are in the lower right half. Thus, CoVERITEstT typically achieves 
higher coverage than value analysis alone. Figure 6(b), comparing CoVERITEST 
with predicate analysis, is more diverse. About 54% of the points are on the 
diagonal, i.e., CoVERITEsT and predicate analysis cover the same number of 
goals. The upper left half contains 19% of the points, i.e., predicate analysis 
alone achieves higher coverage. These points for example reflect float programs 
and ECA programs without arithmetic computations. In contrast, CoVERITEST 
achieves higher coverage in 27% of the programs. CoVERITEsT is beneficial for 
programs that only need few variable values to trigger the branches, like ssh 
programs or programs from the product-lines subcategory. COVERITEsT also 
profits from the value analysis when considering ECA programs with arithmetic 
computations, since the variables have a fixed value in each loop iteration. All 
in all, CoVeriTest performs slightly better than predicate analysis alone. 

Claim 5 (Interleave rather than parallelize). Figure6(c) shows a 
scatter plot that compares for each program the coverage achieved by 
CoVeriTsst (x-axis) and a test generator that runs the value analysis and 
the predicate analysis in parallel'®. As before, we exclude programs for which 


' The test generator uses CPAcuscxer’s parallel algorithm and lets the two analyses 
share information about covered test goals. 
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Fig. 7. Compares the branch coverage achieved by CoVeriTrst (best configuration) 
with the branch coverage achieved by existing state-of-the-art test-generation tools 


we could not get the number of covered goals for at least one of the analyses. 
Looking at Fig. 6(c), we observe that many points (60%) are on the diagonal, i.e., 
the achieved coverage is identical. Moreover, CoVeRITsEst performs better for 
30% (lower right half), while approximately 10% of the points are in the upper 
left half. Since CoVERITEsT achieves the same or better coverage results in about 
90% of the cases, it should be preferred over parallelization. This is no surprise 
since we showed that a test generator should favor the more powerful analysis 
(which CoVeriTsst does, but parallelization evenly distributes CPU time). 
Claim 6 (COVERITEsST complementary). Our goal is to compare 
CoVeriTest and the two best tools of Test-Comp’19 [4]: VertFuzz and KLEE. 
All three tools aim at constructing test suites with high branch coverage. Thus, we 
use branch coverage as comparison criterion. We measure branch coverage with 
TBF TEST-SUITE VALIDATOR. Figure 7 shows two scatter plots. Each plot compares 
branch coverage achieved by CoVeriTest and by one of the other techniques. 1° 
Points in the lower right half indicate that CoVErRITEstT achieved higher coverage. 
Looking at the two scatter plots, we observe that there exist programs for 
which CoVErITsst performs better and vice versa. Generally, we observed that 
CoVerITgst has problems with array tasks and ECA tasks. We already know 
from verification that CPACHECKER sometimes lacks refinement support for array 
tasks. Moreover, the problem with the ECA tasks is that CPACHECKER splits 
conditions with conjunctions or disjunctions—which ECA tasks contain a lot— 
into multiple assume edges. Thus, the number of test goals is much larger than 
the actual branches to be covered. However, COVERITEsT seems to benefit from 
splitting for some of the float tasks. Additionally, CoVERITEsT is often better on 
tasks of the sequentialized subcategory. We think that CoVeErRITEst benefits from 
the value analysis since the tasks of the sequentialized subcategory contain lots of 
branch conditions checking for a specific value or interpreting variable values as 
booleans. All in all, CoVerITEst is not always best, but is also not dominated. 
Thus, CoVerITEst complements the existing approaches. 


16 Note that the scatter plots only contain points that have a positive x and y value 
because there exist different reasons (timeout, out of memory, tool failure, etc.) why 
we might get no or a zero coverage value from the test validator. The plots contain 
points for about 98% of the 1720 programs. 
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4.3 Threats to Validity 


All our CoVeriTEst configurations consider the same two analyses. Our results 
might not apply if using CoVeRITEst with a different set of analyses. In our 
experiments, we used benchmark programs instead of real-world applications. 
Although the benchmark set is diverse and well-established, our results may not 
carry over into practice. 

The validator TBF TEST-SUITE VALIDATOR might contain bugs that result 
in wrong coverage numbers. However, the validator was used in Test-Comp’19 
already, and is based on the well-established coverage-measurement tool gcov. 

For the comparison of the CoVERIT Est configurations as well as the comparison 
of CoVeERITEstT with the single analyses and the parallel approach, we relied 
on the number of covered goals reported by CoVERITEsT. Invalid counterexamples 
could be used to cover test goals. The analyses used by CoVerRITEST apply 
CEGAR approaches and should detect spurious counterexamples. Moreover, these 
analyses run in the SV-COMP configuration of CPACHECKER and are tuned to 
not report false results. Another problem is that whenever CPACHECKER does not 
output statistics (due to timeout, out of memory, etc.), we use the last number of 
covered goals reported in the log. However, this might be an underapproximation 
of the number of covered goals. All these problems do not occur in the comparison 
of CoVerRITEst with KLEE and VeriIFuzz, in which the coverage is measured by 
the validator. Thus, this comparison still supports the value of CoVERITEsT. 


5 Related Work 


CoVERITEsT interleaves reachability analyses to construct tests for C programs. 
To enable cooperation, COVERITEsT extracts information from ARGs constructed 
by previous analysis runs. 

A few tools use reachability analyses for test generation. Biast [5] considers 
a target predicate p and generates a test for each program location that can be 
reached with a state fulfilling the predicate p. For test generation, BLAST uses 
predicate abstraction. FSHELL [44-46] and CPA/Ticrr [12] generate tests for 
a coverage criterion specified in the FSHELL query language (FQL) [46]. Both 
transform the FQL specification into a set of test-goal automata and check for 
each automaton whether its final state can be reached. FSHELL uses CBMC to 
answer those reachability queries and CPA /TIGER uses predicate abstraction. 

Various combinations have been proposed for verification [2, 10, 11,14, 25,27, 
29-31, 35, 37,40, 50,64] and test-suite generation [1,32,34,36,38,47,51,54,56, 59, 
60,63]. We focus on combinations that interleave approaches. SYNERGY [40] 
and DASH [2] alternate test generation and proof construction to (dis)prove a 
property. Similarly, SMASH [37] combines underapproximation with overapproxi- 
mation. Interleaving is also used in test generation. Hybrid concolic testing [54] 
interleaves random testing with symbolic execution. When random testing gets 
stuck, symbolic execution is started from the current state. As soon as a new goal 
is covered, symbolic execution hands over to random testing providing the values 
used to cover the goal. Similarly, Driller [60] and Badger [56] combine fuzzing 
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with concolic execution. However, they only exchange inputs. Xu et al. [51,63] 
interleave different approaches to augment test suites. The approach closest to 
CoVeriTsst is abstraction-driven concolic testing [32]. Abstraction-driven con- 
colic testing interleaves concolic execution and predicate analysis. Furthermore, 
it uses conditions extracted from the ARGs generated by the predicate analysis to 
direct the concolic execution towards feasible paths. Abstraction-driven concolic 
testing can be seen as one particular configuration of CoVERITEsT. 

Also, ARG information has been reused in different contexts. Precision 
reuse [19] uses the precision determined in a previous analysis run to reverify 
a modified program. Similarly, extreme model checking [42] adapts an ARG 
constructed in a previous analysis to fit to the modified program. CPA /TIGER [12] 
transforms an ARG that was constructed for one test goal such that it fits to a 
new test goal. Lazy abstraction refinement [43] adapts an ARG to continue ex- 
ploration after abstraction refinement. Configurable program certification [48, 49] 
constructs a certificate from an ARG, which can be used to reverify a program. 
Similarly, reachability tools like CPACHECKER construct witnesses [6,7] from 
ARGs. Conditional model checking [10,14] constructs a condition from an ARG 
when a verifier gives up. The condition describes the remaining verification task 
and is used by a subsequent verifier to restrict its exploration. 


6 Conclusion 


Testing is a standard technique for software quality assurance. But state-of- 
the-art techniques still miss many bugs that involve sophisticated branching 
conditions [17]. It turns out that techniques performing abstract reachability 
analyses are well-suited for this task. They simply need to check the reach- 
ability of every branch and generate a test for each positive check. However, in 
practice, for every such technique there exist reachability queries on which the 
technique is inefficient or fails [8]. We propose CoVERITEsT to overcome these 
practical limitations. CoVERITEsT interleaves different reachability analyses for 
test generation. We experimented with various configurations of CoVERITESsT, 
which vary in the time limits of the analyses and the type of information 
exchanged between different analysis runs. CoVERITEsT works best when each 
analysis resumes its exploration, different analyses only share test goals, and more 
powerful analyses get larger time budgets. Moreover, acomparison of COVERITEST 
with (a) the reachability analyses used by CoVsriTsst and (b) state-of-the-art 
test-generation tools witness the benefits of the new CoVsERITEsT approach. 

CoVeEriTest participated in Test-Comp 2019 [4] and achieved rank 3 (out of 9) 
in both categories, bug finding and branch coverage." 

In future, we plan to integrate further analyses, e.g., bounded model 
checking or symbolic execution, into COVERITEstT and to evaluate CoVERITEST 
on real-world applications. 


17 https: //test-comp.sosy-lab.org/2019/results / 
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Abstract. Test cases play an important role in testing and debugging 
software. Smaller tests are easier to understand and use for these tasks. 
Given a test that demonstrates a bug, test case reduction finds a smaller 
variant of the test case that exhibits the same bug. Classically, one of the 
challenges for test case reduction is that the process is slow, often taking 
hours. For hierarchically structured inputs like source code, the state of 
the art is Perses, a recent grammar aware and queue driven approach for 
test case reduction. Perses traverses nodes in the abstract syntax tree 
(AST) of a program (test case) based on a priority order and tries to 
reduce them while preserving syntactic validity. 

In this paper, we show that Perses’ reduction strategy suffers from pri- 
ority inversion, where significant time may be spent trying to perform 
reduction operations on lower priority portions of the AST. We show that 
this adversely affects the reduction speed. We propose PARDIS, a tech- 
nique for priority aware test case reduction that avoids priority inversion. 
We implemented PARDIS and evaluated it on the same set of benchmarks 
used in the Perses evaluation. Our results indicate that compared to 
Perses, PARDIS is able to reduce test cases 1.3x to 7.8x faster and with 
46% to 80% fewer queries. 


Keywords: Test case reduction - Automated debugging - 
Priority aware reduction 


1 Introduction 


Test case reduction is a technique that aids in testing and debugging software. 
When an input for a program causes the program to exhibit a property of interest, 
like a bug, finding a smaller input that also exhibits the property can help to 
explain the behavior [1-3]. Given an input I € I and an oracle 4 : I — B that 
performs a test and returns true iff a property holds, test case reduction aims to 
find a smaller input J’ such that w(J’) = true. Often, this problem is approached 
through Delta Debugging (DD), a longstanding and effective algorithm for test 
case reduction that essentially generalizes binary search [2]. However, for inputs 
with significant structure, generic DD can perform poorly, requiring significant 
time and not performing much reduction [3,4]. For compilers in particular, where 
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the inputs must be valid programs, this has led to specialized techniques like 
Hierarchical Delta Debugging [3,4], language specific reducers like C-Reduce [5], 
and most recently to Syntax Guided Program Reduction as seen in Perses [6]. 
Syntax Guided Program Reduction (SGPR) is the present state of the art 
for compiler targeted test case reduction. The intuition behind SGPR is that the 
grammar defining the language of inputs eliminates many invalid sub-inputs from 
the search space. For example, when an input must adhere to the C programming 
language [7], removing the return type of a function declaration would not be 
valid because the C grammar specifies that the return type is required. Such 
syntactically invalid inputs are removed from the search space by SGPR. 
Perses, a form of SGPR, takes as arguments not only a program p and oracle 
w, but also the context free grammar G of valid inputs [6]. It transforms the 
grammar so that removable parts of the input can be identified by the names 
of the grammar rules used to parse them. This also normalizes the grammar so 
that all removable components are expressed through quantifiers in an extended 
context free grammar [8], i.e. optionality (7) and lists (*, +). This transformation 
is illustrated in Fig. 1. Notice, for instance, that the recursive rule BAR denoting 
a list is transformed (==>) into a Kleene-+ quantified list. Individual elements of 
the list may be removed while preserving syntactic validity. Perses then parses 
the input of interest into an abstract syntax tree (AST) and traverses the AST 
while trying to (1) remove optional nodes and (2) perform DD to minimize 
the children of nodes representing lists. The grammar transformations have the 
benefit of making many syntactically correct removals easy and efficient to locate. 


FOO —> a FOO_opt BAR — BAR_plus 


FOO > a | a b => BAR > c | c BAR => 


FOO_opt > b? BAR_plus — c+ 
(a) Optional elements like b are refac- | (b) Lists of elements are refactored into 
tored into rules with ? quantifiers. rules with * or + quantifiers. 


Fig. 1. Overview of Perses grammar transformations for SGPR. 


Perses has significantly improved the speed of program reduction. However, 
it still takes several hours to reduce some inputs. Consider the code in Listing 1.1 
along with its AST in Fig.3. This example is similar to a C program generated 
by the compiler testing tool CSmith [9]. In this example, Perses first considers 
the root node with ID C) of the AST. Since the rule for this node ends in star, 
it is a list node, and its children are the elements of the list. Thus, Perses applies 
DD to the list of children for node (T) to minimize the number of children. When 
such lists are long, significant time can be devoted to this task. We show in 
Sect.4 that this can lead to substantial stalls in reduction, where no progress 
is made while a list is being processed. However, most of the children of this 
node have low token weight, the number of tokens beneath a given node that 
is denoted by w: in Fig.3. Indeed, greater value would be found by focusing 
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on just one of its children, node ©, which contains the majority of the input 
beneath it. By spending greater effort up front on portions of the AST of lesser 
value, Perses suffers from a form of priority inversion. Priority inversion occurs 
when a low priority task is scheduled instead of a high priority task. In this case, 
Perses focuses on removing low token weight nodes instead of high token weight 
nodes. Indeed, Perses may even fail to remove elements that would enable better 
reduction success overall. In this case, the declarations of foo, S, and d are used 
within the code beneath node (5). Thus, those uses need to be eliminated before 
any of the declarations can be removed successfully. In practice, we find that 
priority inversion has a significant impact on reduction time in SGPR. 

To address priority inversion, we have developed priority aware reduction 
strategies for program reduction. By focusing the reduction effort on the nodes 
of the AST that cover the greatest number of tokens, we prioritize reduction 
of the most complex parts of the input first. This has multiple important ben- 
efits: (1) Dependencies between program elements are more likely to be broken 
by eliminating the complex uses first. (2) Stalls in reduction from unsuccessful 
rounds of DD can be mitigated. (3) By removing large portions of an input ear- 
lier on, each oracle query to w can take less time because smaller inputs tend to 
be faster to check. We have designed and evaluated a tool, PARDIS, that makes 
use of these techniques and found that it leads to consistent and significant 
performance improvements over Perses on the Perses benchmarks [6]. 

In summary, this paper makes the following contributions: 


1. Priority awareness. We identify priority inversion as a key problem facing 
SGPR techniques and develop priority aware reduction strategies as a poten- 
tial solution. Priority aware reduction strategies focus the reduction effort on 
the complex portions of an input first, enabling earlier and thus faster test 
case reduction (Sects. 3, 4.1). 

2. Optimization. We identify redundancies in the reduction process when using 
Perses’ transformed grammars and develop a solution to prune them from the 
candidate search space (Sect. 3.2). 

3. Significant performance improvement. We implemented our strategies 
in a tool, PARDIS, and evaluated it on the same benchmarks used by Perses. 
Experimental results show that PARDIS both removes more of the input earlier 
on and is faster overall. Compared to Perses, PARDIS reduces test cases 1.3x 
to 7.8x faster and with 46% to 80% fewer oracle queries (Sect. 4.1). 


2 Background and Motivation 


Consider again the example in Fig. 3 and suppose that the oracle (Y) checks that 
this program p should print "Hello World!" on line 24 (marked with «). Thus, 
the smallest subprogram for which w returns true is the main function with the 
desired print statement. 

To search for this smaller input inside the original input, Perses traverses the 
AST using a priority queue ordered by the token weight. In each trial, the node 
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Listing 1.1: A C program with property 
of interest on line 24. 


1 | double d = 0.10; 

2 |struct S { 

3 int fl; 

4 int f2; 

5 |}; 

6 | void foo(struct S s, char str []){ 
T double v = s.f2 + s.f2 x d; 
8 printf("%s %f\n",str,v); 

9 

10 |int main() { 

11 unsigned int a = 1; 

12 char b[] = "first"; 

13 char c[] = "second"; 

14 if (a) { 

15 struct S sl; 

16 sl.fl = 1; 

17 sl.f2 = 4000; 

18 struct S s2; 

19 s2.fl = 2; 

20 s2.f2 = 2000; 

21 foo(sl, b); 

22 foo(s2, c); 

23 } 

24 printf("Hello World!\n"); (*) 
25 return 0; 

26 | } 

a a «1: 


4.func_def_foo| w:36 


node to 


eainove removed 
~ fi} | F 
OL removed f) | F 
3 {7} F 
Cash Fa 
{4,5} F {4} T 
{2} F {3} ap 
{3} F {10} T 
{4} F {9} a 
{5} F {8} a 
{3,4,5} F {12} F 
{2,4,5} r Qa 
{2,3,4} F (b) PARDIS 
2,3,5 F 
ur ae F a ieee 
8,9,10 F 
cote F n) F 
{8,9} F {5} 
{10,11} T 7) a 
{12,13} F { i } z 
{8} T {4} 
{9} T {3} T 
{12} F iY 7 
(a) Perses {12} h 
{2} T 


(c) PARDIS HYBRID 


Fig. 2. One round of removal trials in 
Perses, PARDIS and PARDIS HYBRID for 
the AST in Fig. 3. Numbers are node IDs. 


1.translationUnit_star) w:137 


w:85 


foo(s2, c); 


Fig. 3. AST of the program in Listing 1.1. w denotes the token weight of each node. 


with the maximum weight is removed from the work queue and traversed. In 
our example, the queue starts out containing only the root of the AST, node D. 
Perses performs specific reduction operations on different types of nodes during 
traversal. For instance, on optional nodes, Perses tries to remove the optional 
child node. For list nodes, Perses minimizes the list of children using DD. Any 
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remaining children of the traversed node are then added to the priority queue 
in order to be traversed in the future. 

Observe that in this example, Perses will first examine node D and remove 
it from the queue. Because D is a list node, DD is applied to the children of D. 
Different combinations of children are removed from (T) and the result is checked 
by w to find a smaller input. First, all children are removed and w is checked. 
After this fails, the first half of the children (@) and @)) are removed, but w 
returns false again because this removes required declarations. Since removing 
the second half of the children (4) and ©) also fails, the process continues 
recursively. First DD tries shrinking the list by removing each individual child, 
and next it tries only keeping each individual child. Ultimately none of the trials 
succeed, so all children are added to the queue, and reduction continues with 
node (5). The intervening node ©) is not tested by SGPR because it is not 
syntactically removable. The next node removed from the work queue is node 
@). This continues until the queue is empty. The precise trials exercised in this 
process are illustrated in Fig. 2(a). Note that 16 steps elapse until a successful 
trial occurs. 

While the priorities used by Perses are controlled by the token weight, they 
determine how the children of the traversed nodes are removed. Thus, any node 
whose parent in the AST is a list is given the same priority as all other elements 
in the list. This is because DD recursively tries to minimize the entire list until 
no single element can be removed, regardless of the priorities of individual list 
elements. As a result, Perses must employ DD on the entirety of the children of 
C) even though it would be more beneficial to focus on just one child, node ©). 

Instead, PARDIS more directly models the priorities. We note that in an 
optional or list node, such as D, each child may be removed in a syntactically 
valid fashion. We call such removable nodes nullable. When traversing a nullable 
node in the AST, we can simply try directly to remove it, adding its children 
if the removal fails. For instance, in the running example, we would visit D 
first. Because D cannot get removed, we would simply add its children to the 
priority queue. Note that all children of D are nullable, but © has the highest 
token weight. Thus, we next select 6) to traverse but removing 6) also fails. 
From the given token weights, we next traverse ©, which is syntactically not 
removable, and then (7), which we attempt to remove but is unsuccessful. Next 
Q is visited and successfully removed. Removing Í) enables the removal of ®, 
(3) and (2). Thus, they are removed in a single pass of the tree using PARDIS, 
whereas Perses would require multiple traversals of the AST to remove them. 
This process continues until the desired output is achieved. As seen in Fig. 2(b), 
just 4 steps elapse until the first successful trial removes node a). 

Note that in this example, PARDIS is able to reduce to the desired output in 
a single pass, while Perses requires multiple passes of the AST. In practice, all 
program reduction techniques continue until a fixed point is reached, including 
PARDIS, however PARDIS can achieve greater reduction in a single traversal of 
the AST, accelerating convergence on the fixed point. 

This priority aware approach can still have drawbacks, however. After focus- 
ing on the highest priority nodes, there may be many lower priority nodes remain- 
ing. For example, there are multiple remaining nodes of weight 7 in the tree after 
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performing the reduction by PARDIS as described above. We also show experi- 
mentally that these lower priority nodes occur in practice in Sect. 5. The above 
approach of PARDIS considers each node one at a time, which can have poor per- 
formance when reducing such long lists. In addition, we thus propose a hybrid 
approach that still prioritizes nodes by maximum token weight but also uses a list 
based reduction technique for spans of nodes that have the same token weight. 
This hybrid approach is able to achieve the benefits of being priority aware while 
still avoiding the cost of considering each node of the AST individually. 
Section 3 presents the algorithms behind these techniques in detail. 


3 Approach 


Recall that the core of PARDIS, similar to Perses, maintains a priority queue of 
the nodes in an AST and traverses the nodes in order to process them. It also 
makes use of Perses Normal Form, the result of the grammar transformations 
that Perses introduced [6]. The key difference is that instead of using the token 
weight of a parent node to determine when its nullable children may be removed, 
PARDIS identifies all nullable nodes (see Sect.3.2) and uses their token weights 
directly to prioritize the search. The core algorithm for this process is quite 
straightforward and presented in Algorithm 1. 


Algorithm 1: Priority queue driven program reduction. 


Input: P : P — The program to reduce as an AST 
Input: 7 : P — B- Oracle for the property to preserve 
Input: p: V > Nx .-.-- x N- Prioritizer for AST nodes 
Result: A minimum program p E P s.t. y(p) 
work + MaxPriorityQueue({p.root}, p) 
while 'work.empty() do 
node < work.takeMax() 
if node.isNullable && yp(p— node) then 
| p+ p- node 
else 
| work.insert(node.children) 


NYNOaboNH 


8 return p 


Line 1 of the algorithm constructs the priority queue (a max-heap), initial- 
izing it with the root of the AST and using a parameterizable priority p. p is 
simply a function that takes a node and returns its priority as a tuple. The 
priority queue selects the element with a lexicographically maximal priority, so 
ties on the first element of the priority tuple are broken by the second element 
and so on. As seen in Fig. 4, for PARDIS, pparpis returns a pair of numbers, the 
token weight of the node and the position of the node in a decreasing, right-to- 
left, breadth first search. The specific breadth first order means that for an AST 
with n nodes, bfsOrder(p.root)=n, the last child c of p.root has bfsOrder(c)=n-1, 
and so on. Thus, if several nodes have the same token weight, the one highest in 
the AST and furthest to the right is selected next. This ordering decreases the 
chances of trying to remove a declaration before its uses [10]. 

Line 2 starts the core of the algorithm. While there are more nodes to explore 
in the queue, the node with the next highest priority is considered. If it is nullable 


PARDIS: Priority Aware Test Case Reduction 415 


and can be successfully removed, we remove it from the AST, otherwise we add 
its children to the queue so that they will also be traversed. 

While the algorithm is surprisingly simple, we have found it to perform sig- 
nificantly better than the state of the art in practice. As we explore in Sect. 4.2, 
this results from prioritizing the search toward those portions of the input where 
reduction can have the greatest impact. To more closely compare with Perses, 
consider a version of Perses that upon visiting a list or optional node only tries 
removing each child of that node once!. This “one node at a time” variant of 
Perses can also be implemented using Algorithm 1 by carefully choosing the pri- 
ority formula p. Because Perses considers removing the children of the nodes it 
traverses, it actually prioritizes the work queue using the token weight of the par- 
ent rather than the token weight of nullable nodes being considered for removal. 
This leads to the alternative prioritizer Pperses presented in Fig. 4. Observe that 
all children of a list node receive the same token weight, that of the entire list. 
This can inflate the priority of some nodes in the work queue and leads to poor 
performance. 


Definitions: 
tokensBelow(n) — returns the number of tokens beneath an AST node n. 
returns the position of an AST node n in a decreasing, 


bfsOrder(n) — right to left, breadth first search. 
Prioritizers: 
PParvis = (tokensBelow(n), bfsOrder(n)) 
let parentWeight <— tokensBelow(n.parent) if n.parent else oo in 
Pperses = let parentOrder < bfsOrder(n.parent) if n.parent else oo in 
(parentWeight, parentOrder) 


_ { let parentOrder + bfsOrder(n.parent) if n.parent else oo in 
Peat he (tokensBelow(n), parentOrder, bfsOrder(n)) 


Fig. 4. Prioritizers used for PARDIS, node at a time Perses, and PARDIS HYBRID. 


Like other program reduction algorithms [3,5,6,11,12], Algorithm 1 is used 
to compute a fixed point. That is, in practice the algorithm is repeated until 
no further reductions can be made. As in prior work, we omit this from our 
presentation for clarity. In theory, this means that the worst case complexity of 
the technique is O(n?) where n is the number of nodes in the AST. This arises 
when only one leaf of the AST is removed in each pass through the algorithm. 
In practice, most nodes are not syntactically nullable, and we show in Sect. 4.1 
that performance of PARDIS exceeds the state of the art. 

In addition, while we focus on removing nodes of the AST, Perses also tries 
to replace non-list and -optional nodes with compatible nodes in their subtrees. 
We do not focus on this aspect of the algorithm. In practice, we found it to 
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significantly hurt performance (see Sect.4.1) and we consider efficient replace- 
ment strategies to be orthogonal to and outside the scope of this work. 


3.1 PARDIS HYBRID 


The initial priority aware technique from Algorithm 1 can also encounter perfor- 
mance bottlenecks, however. The original motivation for using DD on lists of 
children in the AST was that its best case behavior is O(log(n)) where n is the 
number of children in the list. This is because it tries removing multiple children 
at the same time. Processing one node at a time, however, requires that every 
list element is considered individually, guaranteeing O(n) time for one round of 
Algorithm 1. Priority aware reduction that proceeds one node at a time faces a 
different set of inefficiencies that can still cause stalls in the reduction process. 

Thus, we desire a means of removing multiple elements from lists at the 
same time while still preserving priority awareness. In order to achieve this, we 
developed PARDIS HYBRID, as presented in Algorithm 2. This approach uses a 
modified prioritizer as presented in Fig. 4 that first orders by token weight, then 
by parent traversal order, then by node traversal order. The effect this has is 
that all children of the same parent with the same weight are grouped together. 
As a result, we can remove them from the priority queue together and perform 
list based reduction (like DD) to more efficiently remove groups of elements in 
a list that have the same priority (for instance, nodes (Q) and @0) get removed as 
a group in one trial using PARDIS HYBRID as shown in Fig. 2(c)). Because the 
search is still primarily directed by the token weights of the removed nodes, the 
technique still fully respects the priorities of the removed nodes. 


Algorithm 2: PARDIS HYBRID algorithm with priority aware list re- 


duction. 
Input: p : P — The program to reduce as an AST 
Input: w : P — B- Oracle for the property to preserve 
Result: A minimum program p €E P s.t. y(p) 
work <— MaxPriorityQueue({p.root}, PParnis Hyerm ) 
while !work.empty() do 
nodes +— work.takeWithSameWeightAndParent() 
nullable, nonnullable + partitionNullable(nodes) 
removed, retained + minimize(p, nullable, 7) 
p + p - removed 


work.insert((_) 


YNoabone 


F x.children 
x €retainedunonnullable ) 


8 return p 


Similar to the previous approach, line 1 of Algorithm 2 starts by creating the 
priority queue. Note that it specifically uses the prioritizer pparpis Hysriw, Which 
groups children having the same token weight in the priority queue. As long as 
there are more nodes to consider, line 3 takes all nodes from the queue with the 
same weight and parent. If the weight of a node is unique, this simply returns 
a list of length 1. Line 4 filters out non-nullable nodes from the trial, and line 
5 just applies list based reduction to any nullable nodes. Lines 6 and 7 then 
remove the eliminated nodes from the tree and add the children of remaining 
nodes to the work queue. Again, this algorithm actually runs to a fixed point. 
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While the worst case behavior of DD is O(n?) [2], this can be improved 
to O(n) by giving up hard guarantees on minimality [13]. Since this reduction 
process is performed to a fixed point anyway, minimize on line 5 makes use of this 
O(n) approach to list based reduction (OPDD) without losing 1-minimality. As 
a result, the theoretical complexity of PARDIS HYBRID is the same as PARDIS. 


3.2 Nullability Pruning 


Finally, we observed that many oracle queries were simply unnecessary. Specif- 
ically, recall that a node can be tagged nullable because it is an element of a 
list or a child of an optional node, as previously defined by Perses grammar 
transformations [6]. The complete algorithm for this tagging is in TagNullable of 
Algorithm 3. However, for example, a list of one element could contain another 
list of one element. In the AST, this appears as a chain of nodes, at least two of 
which are nullable. Removing any one of these nodes removes the same tokens 
from the AST. Thus, it is only necessary to select a single nullable node from 
any chain of nodes, and the others can be disregarded. 

We exploit this through an optimization called nullability pruning. We tra- 
verse every chain of nodes in the AST, preserving the nullability of the highest 
node in the chain and removing nullability from those below it. The complete 
algorithm is presented in PruneNullable of Algorithm 3. In effect, it is just a 
depth first search that removes redundant nullability from nodes along the way 
instantaneously. 

In practice, we find that this can statically (ahead of time) prune most of the 
AST from the search space. Specifically, in the benchmarks we examine in Sect. 4, 
we find that of 1,593,875 total nullable nodes, 17% are redundant optional nodes 
and 44% are redundant list element nodes. We observe the impact of this pruning 
on the actual reduction process in Sect. 4.1. 


Algorithm 3: Nullability tagging and pruning. 


1 Function TagNullable(p) 
Input: p : P — The program to reduce as an AST 


2 foreach Node n € p do 
3 if n € KleeneStar U KleenePlus U Optional then 
4 | foreach c € n.children do c.isNullable + true 


5 Function PruneNullable(p) 
Input: p : P — The program to reduce as an AST 


6 Function OptimizeBelow(n) 

7 hasNullable <— false 

8 Loop 

9 if hasNullable then 

10 | n.isNullable < false 

11 else if n.isNullable then 
12 | hasNullable < true 

13 if 1 == |n.children| then 
14 | break 

15 n < n.getOnlyChild() 
16 foreach c € n.children do OptimizeBelow(n) 


17 | OptimizeBelow(p.root) 
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4 Evaluation 


We evaluate PARDIS’s performance and examine the impact of priority inversion 
on reduction by answering the following research questions: 


e RQ1. How does PARDIS perform compared to Perses in terms of reduction 
time and speed, number of oracle queries, and size of the reduced test case? 

e RQ2. Does priority inversion adversely affect the reduction efficiency? In 
particular, does reduction require more work with a traversal order suffering 
from priority inversion? 


4.1 RQ1. Performance: PARDIS vs. Perses 


Experimental Set-Up. We evaluate PARDIS on the set of C test cases used 
in the evaluation of Perses, including the oracle scripts provided by authors of 
Perses. While using these, we observed that they still allowed for some unde- 
fined behavior [5,14], so we updated all oracles to reject test case variants with 
undefined behavior. As a result, we were able to reproduce bugs for 14 out of 
20 original test cases. The remaining benchmarks that could not reproduce their 
original failures were elided for this study. Since the implementation of Perses’ 
components is not publicly available, we implemented the Perses grammar trans- 
formations and reduction based on the algorithms available in the paper [6] using 
the C++ bindings of ANTLR [15]. All of our implementations have been made 
available”. Our experiments were conducted on an Intel Xeon E5-2630 CPU and 
64GB memory running Ubuntu. 


Variants of Reduction Techniques. To better explain performance dif- 
ferences, we benchmark several algorithms that each add one difference. All 
approaches compute fixed points as previously described. 


e Perses DD- The removal-based algorithm of Perses that applies DD on chil- 
dren of list nodes [6]. 

e Perses OPDD- The same as Perses DD but using the O(n) reduction algo- 
rithm of OPDD [13]. It is faster than Perses DD in practice. 

e Perses N- The one node at a time Perses that does not apply DD on list 
elements but removes them one by one using Perses’ parent oriented priorities. 

e PARDIS W/O PRUNING- This uses the PARDIS algorithm but does not apply 
nullability pruning optimization proposed in Sect. 3.2. 

e PARDIS- Our proposed removal algorithm that also applies nullability 
pruning. 

e PARDIS HYBRID- The hybrid version of PARDIS with nullability pruning and 
OPDD as its version of DD. 


? https: //github.com/golnazgh/PARDIS. 
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Table 1. Original and reduced test case size and number of oracle queries. 
Perses Perses Perses PARDIS PARDIS PARDIS 
DD OPDD N w/o PRUNING : i HYBRID 

SUE OWH |R QW Ra QW Ra QWR QW RE ew] RE RQH 
clang-22382 | 21,068 | 597 5,323] 597 4,865] 354 3,203] 354 2,702 | 354 2,011} 354 2,319 
clang-22704 | 184,444| 250 4,181] 250 3,775] 220 5,083] 236 4,956 | 236 4,342 | 236 2253 
clang-23309 | 38,647 | 1,624 8,688 | 1,624 8,095 | 1,522 6,106 | 1,726 4,618 | 1,7261300% 1,726 3,684 
clang-25900 | 78,960 | 618 4,455 | 618 4,020] 600 2,816] 618 2,343] 618 1,652| 618 1,997 
clang-27137 | 174,538| 725 9,035| 725 8,299| 681 6,858| 807 5,889| 807 4,293| 807 4,89 
clang-27747 | 173,840| 379 3,171 | 379 2,845] 311 1,773] 313 1,418 | 313 1,074] 308 1,218 
clang-31259 | 48,799 | 821 4,457 | 821 4,073 | 821 3,282] 538 2,464 | 538 1,662) 538 1,853 
gcc-64990 48,931| 776 5,913 | 776 5,438 | 1,215 5,165 | 776 3,781 | 776 2,632) 776 3,148 
gcc-65383 | 43,942 | 462 5,503 | 462 5,002] 486 3,502] 598 2,559 | 598 1,839} 598 2,204 
gcc-66186 | 47,481 | 1,176 6,101] 1,176 5,727] 1,178 4,532 | 1,176 3,944 | 1,176 2,562 | 1,176 3,167 
gcc-66375 | 65,488 | 1,232 7,989 | 1,232 6,780] 1,198 4,202] 1,232 4,512 | 1,232 3,036 | 1,232 3,85 
gcc-70127 | 154,816] 600 5,610] 600 5,201] 593 3,700] 600 3,063| 600 2,240] 600 2,723 
gcc-70586 | 212,259] 1,583 7,671 | 1,583 7,276 | 1,489 5,582 | 1,497 5,233 | 1,497 3,491 | 1,497 4,318 
gcec-71626 6,133 58 1,151} 58 1,135] 58 1,013] 58 330 | 58 264 | 58 228 
geomean 70300 | 609 5126] 609 4705] 583 3670] 574 2881 | 574 | 2066] 574 2270 
median 72,224 | 672 5,556] 672 5,102] 640 3,951] 609 3,422 | 609 2,401} 609 2,52 
O, R and Q denote number of tokens in the original test case, reduced one and total number of oracle 


queries performed by the reduction technique, respectively. 


Reduction Performance. We compare these techniques in terms of the number 
of oracle queries (Q), reduction quality or size of the final reduced test case (R), 
reduction time (T), and reduction speed or the average number of tokens removed 
per second (E). Results are presented in Tables 1 and 2. The best values of queries, 
time, and speed are highlighted for each test case. As can be seen, in all cases, 
either PARDIS or PARDIS HYBRID outperform all variants of Perses. Compared to 
the full removal-based Perses algorithm (Perses DD), our proposed algorithms 
reduce 1.3x to 7.8x faster and with 46% to 80% fewer queries. The results 
across variants suggest that these benefits arise from priority awareness and 
nullability pruning. Due to fixed point computation, all approaches produce test 


Table 2. Reduction time and speed for different variants of reduction techniques. 
Perses Perses Perses PARDIS PARDIS PARDIS 
DD OPDD N w/O PRUNING ~ HYBRID 

Bug T(s) E(#/s)|T(s) E/s)|T(s) E(#/s)| Ts) Et#/s)| T(s) EG#/s)| T(s) E(#/s) 
clang-22382 | 3,198 6 | 3,122 7 | 3,489 6 | 3,057 7 | 2,977 7 | 2,094 10 
clang-22704| 1,527 121/1,304 141 | 5,243 35 | 3,323 55 | 3,219 57 | 1,160 159 
clang-23309 | 2,571 14 | 2,414 15 | 1,920 19 | 1,423 26 | 1,007 37 | 1,062 35 
clang-25900 | 1,375 57 | 1,220 64 | 1,025 76 | 690 114 | 526 149 | 518 151 
clang-27137 | 6,972 25 | 6,379 27 | 5,717 30 | 4,428 39 | 3,423 51 | 3,538 49 
clang-27747 | 1,194 145 | 1,060 164) 771 225 | 571 304 | 463 375 | 453 383 
clang-31259 | 1,698 28 | 1,577 30 | 1,471 33 | 1,239 39 | 814 59 | 800 60 
gcc-64990 | 1,980 75 | 1,768 84 | 1,981 75 | 1,237 120 | 932 159 | 916 162 
gec-65383 | 1,762 25 | 1,615 27 | 1,304 33 | 892 49 | 704 62 | 699 62 
gcc-66186 | 1,583 29 | 1,493 31 | 1,299 36 | 1,016 46 | 691 67| 741 62 
gcc-66375 | 2,782 23 | 2,568 25 | 1,851 35 | 1,705 38 | 1,173 55 | 1,311 49 
gcc-70127 | 3,083 50 | 2,812 55 | 2,265 68 | 1,520 101 ]1,124 137] 1,173 131 
gec-70586 | 4,417 48 | 4,119 51 | 3,450 61 | 2,545 83 | 1,791 118 | 1,984 106 
gcc-71626 56 39 | 156 39 | 206 29| 57 107 | 54 112 | 20 304 
geomean 900 36 | 1750 40 | 1740 40 | 1202 58 | 933 75 | 807 86 
median | 1,871 34 | 1,692 35 | 1,886 35 | 1,331 52 | 970 64 | 989 84 


T is reduction time in seconds. E is the efficiency of removal (number of tokens removed per second). 
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cases from which no one token can be removed while satisfying y (1-minimal) [2], 
but they can produce different final reduced test cases [2]. On average, PARDIS 
yields reduced test cases with 574 tokens compared to Perses DD with 609 tokens. 

In addition, we graphed the reduction progress of each test case for the 
different variants. Fig.5 shows the percentage of remaining tokens over time 
during reduction. For sake of space, we only include graphs for six of the test 
cases. Note that the y-axis is log scaled. PARDIS and PARDIS HYBRID show much 
faster convergence to a reduced test case compared to Perses variants. Recall that 
the only factor differentiating Perses N from PARDIS W/O PRUNING is the order 
in which the queue of nodes is traversed. Unlike Perses N, PARDIS W/O PRUNING 
does not suffer from priority inversion and guides the reduction process based 
on token weights of the nodes to remove. As can be seen, this advantage leads 
to faster convergence to a reduced test case. We examine the impact of priority 
inversion on reduction speed more rigorously in Sect. 4.2. 


Replacement. As mentioned in Sect.3, Perses also considers a replacement 
strategy for non-list or -optional nodes in addition to removal for other nodes. For 
instance, in Fig. 3, Perses will attempt to replace node (6) with node (4 because 
they both match the same grammar rule (compound_stmt). This replacement 
fails since required declarations will get removed and w will return false. 
Including replacement significantly increases the work done by reduction. For 
completeness, we implemented Perses DD with replacement as described in their 
paper [6] and defined a four-hour timeout for the reduction process. In 11 out 
of 14 cases, Perses DD with replacement could not finish the reduction process 
before reaching the timeout. In the remaining three, it generated reduced test 
cases with the same size or slightly smaller while performing a significantly larger 
number of oracle queries (more than 3x over Perses DD without replacement). 


4.2 RQ2. The Impact of Priority Inversion 


As shown in Fig.5, avoiding priority inversion leads to faster convergence. One 
explanation for this is that priority awareness may decrease the amount of work 
required to remove a token (as seen in the motivating example). We explore 
this in a case study on gcc-64990 with 148,931 tokens. The number of removal 
attempts for a token is number of times a single token is considered for removal. 
Removing any ancestor of a token in the AST will remove that token, so if a 
first attempt fails, a deeper ancestor may be attempted. We compute this for 
every token of the test case to get a sense of the work required for each token. 
A better traversal order of the AST should cause fewer overall token removal 
attempts. To measure only the impact of different traversal orders, we compare 
PARDIS W/O PRUNING with Perses N. As described in Sect. 4.1, they follow the 
exact same reduction rules and differ only in their traversal orders. 

Figure 6 depicts histograms of the distributions of token removal attempts for 
PARDIS W/O PRUNING and Perses N. For clearer visualization, we show only the 
distributions for the number of attempts less than or equal to 20. We can see how 
Perses N distribution is inclined toward a larger number of removal attempts, 
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Fig. 5. Converging to a reduced test case in six variants of reduction techniques. 


an indicator of more work required in order to remove individual tokens. In 
addition, we statically measure that the difference between the removal attempt 
distributions is significant. We use a one sided Wilcoxon rank-sum test [16] to 
determine whether the distribution of Perses N is indeed greater than that of 
PARDIS W/O PRUNING. The p-value computed for our data was less than 2.2e~ 1° 
which strongly supports this observation. 
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Fig.6. Distributions of token removal attempts for PARDIS W/O PRUNING and 
Perses N. 


5 Discussion 


PARDIS HYBRID as a sweet spot in reducing test cases: As discussed earlier, 
unlike Perses, PARDIS HYBRID does not suffer from priority inversion because it 
prioritizes the search primarily on the token weight of nodes being considered 
for removal. Moreover, unlike PARDIS, it does not strictly remove one node at a 
time and allows the removal of nodes with the same weight and the same parent 
as a group. Hence, it can be considered a sweet spot in reducing test cases. We 
conduct two studies that can further explore this idea. 


(1) Oracle Verification Time. The number of oracle queries is a common met- 
ric used in similar studies to reason about reduction efficiency since it directly 
impacts the total reduction time [2,3,6,13,17]. For instance, both PARDIS and 
PARDIS HYBRID perform fewer oracle queries and take less time than Perses. 
However, the number of oracle queries is not the only factor involved. The time 
required to run each of these queries, or oracle verification time, also affects 
the total running time. For instance, as presented in Sect.4.1, PARDIS has the 
smallest number of oracle queries in 12 out of 14 test cases. However, in terms 
of total reduction time and speed, PARDIS HYBRID is the fastest in 8 out of 14 
cases, even while performing more queries compared to PARDIS in 6 of them. 
Oracle verification time can depend on multiple elements such as the size and 
complexity of the test case. Since PARDIS HYBRID takes advantage of the possi- 
bility to remove more than one node at a time, it may try variants of the test 
case that are smaller and may be faster to verify compared to PARDIS. To check 
this hypothesis, we conducted a case study on gcc-64990 and recorded the run- 
ning time of each oracle query during reduction. As shown in Tables1 and 2, 
PARDIS reduces this test case in 932s with 2,632 queries, and PARDIS HYBRID 
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has a total reduction time of 916s (16s shorter) while performing 3,148 oracle 
queries (516 more queries). Both techniques yield the same final test case. 

Figure 7 depicts the distribution of oracle verification times in PARDIS and 
PARDIS HYBRID, showing that PARDIS has more queries that take longer com- 
pared to PARDIS HYBRID. The shorter queries in PARDIS HYBRID directly 
decrease its overall reduction time making it reduce test cases with fewer queries 
compared to Perses and shorter queries compared to PARDIS. 
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time for PARDIS and PARDIS HYBRID. nodes visited during PARDIS reduction. 


(2) Distribution of Token Weights. The motivation behind proposing 
PARDIS HYBRID as discussed in Sect.3.1 was that if lists in a test case shrink 
after removing nodes with large unique token weights, applying DD on list ele- 
ments with the same weight can be beneficial. In fact, the more of the remaining 
nodes that share token weights, the more beneficial using DD becomes since it 
provides the opportunity to remove those nodes in just one trial. This can avoid 
the possibly time-consuming process of visiting nodes one by one. To understand 
the distribution of token weights in practice, we perform PARDIS (the one node 
at a time removal) on gcc-64990 and record token weights of nodes visited dur- 
ing the removal process. Figure 8 shows the distribution with 5 as the median of 
token weights of nodes visited during the reduction. The small median motivates 
the use of PARDIS HYBRID in practice since it indicates that half of the nodes 
have one of only five different token weights and can benefit from the grouped 
removals. 

Syntactic vs Semantic Validity: Perses and PARDIS discard syntactically 
invalid variants of the test case during reduction. However, there are also seman- 
tically invalid queries such as removing the declaration of a variable before remov- 
ing its use. SGPR techniques cannot entirely avoid these queries since they guide 
the reduction process based on the syntax of the grammar. However, the priority 
order of PARDIS can mitigate this problem. By prioritizing by token weight, it 
is more likely to visit and remove uses before spending effort on declarations. 
One reason for this is that a higher token weight tends to mean that there are 
more uses beneath that node. For instance, in Fig.3, uses of variables a, b and 
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c are descendants of node Q) with nodes (8), @) and Ù as their declarations. 
PARDIS removes the uses by first removing (1) while Perses tries to remove the 
declarations first due to priority inversion. Hence, PARDIS prunes nodes in one 
pass of the AST that Perses may require a fixed point mode to remove. 
Threats to Validity: We evaluated PARDIS on the same set of C test cases 
used in the evaluation of Perses. The implementation of Perses’ grammar trans- 
formations and reduction is not publicly available, so we reimplemented Perses 
as described in its paper. Our implementation has been made available to pro- 
vide a consistent platform for future work. However, the exact implementations, 
environmental settings and the scripts to check the property of interest can 
all impact the final results. For instance, the final sizes of the reduced test 
cases reported for the original Perses’ implementation [6] are smaller than those 
using our reimplemented version of Perses. As discussed in Sect. 4.1, this may be 
because Perses’ oracles allowed for undefined behavior, which can lead through 
smaller but invalid reduced test cases. To mitigate this problem, we made the 
oracles strictly prevent undefined behavior for both PARDIS and Perses. Note 
that PARDIS significantly outperforms both Perses’ original implementation [6] 
and our reimplementation in terms of number of oracle queries. 

While the techniques presented in PARDIS are general in ability, our eval- 
uation focuses on C in order to compare with Perses. Further investigation is 
required to claim that the performance benefits extend to other languages. 


6 Related Work 


The closest work to this paper is Perses [6]. Unlike PARDIS, it suffers from 
priority inversion that adversely affects the reduction speed. Other generic 
test case reduction techniques are Delta Debugging (DD) [2], its O(n) vari- 
ant [13], and Berkeley Delta [18]. These face challenges when reducing hier- 
archical inputs. Several techniques focus on reducing hierarchically structured 
test cases [3,4,6,11,12,19,20]. Among these, only Perses is priority aware, in 
spite of its priority inversion. Indeed, most techniques process the input level by 
level. Like PARDIS, Perses and Simp [20] are notable exceptions in that they can 
search across levels when deciding how to reduce. However, Simp is specific to 
SQL Queries. GTR [12] is notable in that it is trained when to apply different 
reduction operations. Finally, C-Reduce [5] is a tool for reducing C/C++ test 
cases that requires extensive domain-specific knowledge. 


7 Conclusions 


We have shown that the prior state of the art for test case reduction suffers from 
priority inversion and that this causes a significant increase in reduction time. 
We proposed priority aware reduction techniques, PARDIS and PARDIS HYBRID, 
that focus reduction effort where they can have the most impact. These tech- 
niques can speed reduction by 1.3x to 7.8x over the prior state of the art. 


PARDIS: Priority Aware Test Case Reduction 425 


Acknowledgements. This research was partially supported by the Natural Sciences 
and Engineering Research Council of Canada. 


References 


Li; 


10. 


11. 


12. 


13. 


14. 


15. 


Clapp, L., Bastani, O., Anand, S., Aiken, A.: Minimizing GUI event traces. In: 
Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations 
of Software Engineering, FSE 2016, Seattle, WA, USA, 13-18 November 2016, pp. 
422-434 (2016) 

Zeller, A., Hildebrandt, R.: Simplifying and isolating failure-inducing input. IEEE 
Trans. Softw. Eng. 28(2), 183-200 (2002) 

Misherghi, G., Su, Z.: HDD: hierarchical delta debugging. In: 28th International 
Conference on Software Engineering (ICSE 2006), Shanghai, China, 20-28 May 
2006, pp. 142-151 (2006) 

Misherghi, G.S.: Hierarchical delta debugging. Master’s thesis, University of Cali- 
fornia Davis (2007, Approved) 

Regehr, J., Chen, Y., Cuoq, P., Eide, E., Ellison, C., Yang, X.: Test-case reduction 
for C compiler bugs. In: ACM SIGPLAN Conference on Programming Language 
Design and Implementation, PLDI 2012, Beijing, China, 11-16 June 2012, pp. 335- 
346 (2012) 

Sun, C., Li, Y., Zhang, Q., Gu, T., Su, Z.: Perses: syntax-guided program reduc- 
tion. In: Proceedings of the 40th International Conference on Software Engineering, 
ICSE 2018, Gothenburg, Sweden, 27 May—03 June 2018, pp. 361-371 (2018) 
Kernighan, B.W., Ritchie, D.: The C Programming Language, 2nd edn. Prentice- 
Hall, Upper Saddle River (1988) 

Albert, J., Giammaressi, D., Wood, D.: Extended context-free grammars and nor- 
mal form algorithms. In: Champarnaud, J.-M., Ziadi, D., Maurel, D. (eds.) WIA 
1998. LNCS, vol. 1660, pp. 1-12. Springer, Heidelberg (1999). https://doi.org/10. 
1007 /3-540-48057-9_1 

Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and understanding bugs in C 
compilers. In: Proceedings of the 32nd ACM SIGPLAN Conference on Program- 
ming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, 4-8 
June 2011, pp. 283-294 (2011) 

IBM Support, Test Case Reduction Techniques. http://www-01.ibm.com/support/ 
docview.wss?uid=swg21084174 

Hodovan, R., Kiss, A.: Coarse hierarchical delta debugging. In: Proceedings of 
the 33rd IEEE International Conference on Software Maintenance and Evolution, 
ICSME 2017, Shanghai, China, 20-22 September 2017, pp. 194-203 (2017) 
Herfert, S., Patra, J., Pradel, M.: Automatically reducing tree-structured test 
inputs. In: Proceedings of the 32nd IEEE/ACM International Conference on Auto- 
mated Software Engineering, ASE 2017, Urbana, IL, USA, 30 October—-03 Novem- 
ber 2017, pp. 861-871 (2017) 

Gharachorlu, G., Sumner, N.: Avoiding the familiar to speed up test case reduc- 
tion. In: 2018 IEEE International Conference on Software Quality, Reliability and 
Security, QRS 2018, Lisbon, Portugal, 16-20 July 2018, pp. 426-437 (2018) 
Hathhorn, C., Ellison, C., Rosu, G.: Defining the undefinedness of C. In: Proceed- 
ings of the 36th ACM SIGPLAN Conference on Programming Language Design 
and Implementation, Portland, OR, USA, 15-17 June 2015, pp. 336-345 (2015) 
Parr, T.: The Definitive ANTLR 4 Reference, 2nd edn. Pragmatic Bookshelf, 
Raleigh (2013) 


426 G. Gharachorlu and N. Sumner 


16. 


17. 


18. 


19. 


20. 


Wild, C., Seber, G.: Chance Encounters: A First Course in Data Analysis and 
Inference, 1st edn. Wiley, New York (1999) 

Hodovan, R., Kiss, A.: Practical improvements to the minimizing delta debugging 
algorithm. In: Proceedings of the 11th International Joint Conference on Software 
Technologies (ICSOFT 2016) - Volume 1: ICSOFT-EA, Lisbon, Portugal, 24-26 
July 2016, pp. 241-248 (2016) 

McPeak, S., Wilkerson, D.S., Goldsmith, S.: Delta, July 2003. http://delta.stage. 
tigris.org/ 

Kiss, Á., Hodován, R., Gyimóthy, T.: HDDr: a recursive variant of the hierarchical 
delta debugging algorithm. In: Proceedings of the 9th ACM SIGSOFT Interna- 
tional Workshop on Automating TEST Case Design, Selection, and Evaluation, 
A-TEST 2018, pp. 16-22 (2018) 

Bruno, N.: Minimizing database repros using language grammars. In: Proceedings 
of 13th International Conference on Extending Database Technology, EDBT 2010, 
Lausanne, Switzerland, 22-26 March 2010, pp. 382-393, 2010 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 


or 


format, as long as you give appropriate credit to the original author(s) and the 


source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the chapter’s 


Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


Automatically Identifying Sufficient 
Object Builders from Module APIs 


Pablo Ponzio!3(@), Valeria S. Bengolea!, Mariano Politano!, 


Nazareno Aguirre!?, and Marcelo F. Frias? 


1 Universidad Nacional de Rio Cuarto, Río Cuarto, Argentina 
{pponzio, vbengolea,mpolitano,naguirre}@dc.exa.unrc.edu.ar 


? Instituto Tecnológico de Buenos Aires (ITBA), Buenos Aires, Argentina 


mfrias@itba.edu.ar 


3 Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), 


Buenos Aires, Argentina 


Abstract. Various approaches to software analysis (e.g. test input gen- 
eration, software model checking) require engineers to (manually) iden- 
tify a subset of a module’s methods in order to drive the analysis. Given a 
module to be analyzed, engineers typically select a subset of its methods 
to be considered as object builders to define a so-called driver, that will 
be used to automatically build objects for analysis, e.g., combining them 
non-deterministically, randomly, etc. This requires a careful inspection 
of the module and its API, since both the relative exhaustiveness of the 
analysis (leaving important methods out may systematically avoid gen- 
erating different objects), as well as its efficiency (the different bounded 
combinations of methods grows exponentially as the number of methods 
increases), are affected by the selection. 

We propose an approach for automatically selecting a set of builders 
from a module’s API, based on an evolutionary algorithm that favors sets 
of methods whose combinations lead to producing larger sets of objects. 
The algorithm also takes into account other characteristics of these sets 
of methods, trying to prioritize the selection of methods with less and 
simpler parameters. As the implementation of this evolutionary mecha- 
nism requires in principle handling and comparing large sets of objects, 
and this grows very quickly both in terms of space and running times, 
we employ an abstraction of sets of objects, called field extensions, that 
involves using the field values of the objects in the set instead of the 
actual objects, and enables us to effectively implement our mechanism. 
An experimental assessment on a benchmark of stateful classes shows 
that our approach can automatically identify sets of builders that are 
sufficient (can be used to create any instance of the module) and mini- 
mal (do not contain superfluous methods), in a reasonable time. 


1 Introduction 


S 


Check for 
updates 


As software is becoming more ubiquitous thanks to the rapid advances in tech- 
nology, guaranteeing the functional correctness of software is more crucial than 
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ever. Thus, a research area of growing importance is that of automated software 
analysis, whose goal is to assist engineers, through the provision of tools for 
automated analysis, in finding deficiencies both in software and software related 
models. Automated test generation [1,11,13,17,24,25, 28, 29,32], software model 
checking [9, 34,35], and static analyses [6,16], among many others, are prominent 
approaches in this line of research. 

While these techniques involve in many cases fully automated analyses, their 
application often requires some effort from the engineers. Software model check- 
ers rely on the definition of drivers, programs that allow one to build inputs for 
the code under analysis. Similarly, in parameterized-unit testing approaches [33] 
a mechanism for building inputs is mandatory. Some symbolic execution based 
tools require the so-called “object factories” to build tests cases involving inputs 
with non-primitive types [32]. Automated test generation techniques based on 
a module’s API can be used for building inputs for non-primitive types [11,24], 
thus automating the above-mentioned input-generation issues. But they usually 
present difficulties in generating a good set of diverse inputs for stateful, complex 
structures. This is even more difficult for structures with rich APIs [26]. Many 
authors have addressed this problem by defining different approaches for guiding 
test generation, to create more diverse sets of inputs [7,26]. 

In this paper, we take a different approach to address the problem of gener- 
ating better inputs for stateful modules. We observe that the selection of rou- 
tines from a module API, to feed an input generation tool so as to build input 
structures for program analysis (drivers for model checking, input structures 
for parameterized unit tests, etc.), has a crucial impact on the analysis. We 
call builders a set of routines B, drawn from a module’s M API, that can be 
employed to create input structures in an automated program analysis for M 
(e.g. a driver for model checking). Clearly, the higher the number of different 
structures that can be created with B, the better the chances to find bugs in M. 
As the number of instances of a software module is potentially infinite, and the 
program analyses we target are also limited in the number of structures they can 
employ, we limit ourselves to a bounded-exhaustive set of structures for M [4] 
(e.g. all the instances of a linked list with up to k nodes). We denote this set by 
BE(M,k). We say that a builders are sufficient if they can combined to build all 
the instances in BE(M,k). Thus, sufficient builders are the best possible choice 
for bug finding (in a bounded setting). Notice that B can contain superfluous 
routines. A superfluous routine s is such that BE(M,k) can be built using rou- 
tines in B — {s} (the simplest example being routines that never change the 
state of their parameters). These routines provide no benefits in terms of bug 
finding capabilities of the analysis. We call minimal a set of builders with no 
superfluous routines. Minimality is important because providing an analysis tool 
with superfluous routines often negatively impacts its efficiency (the number of 
ways k routines can be combined usually increases exponentially with k). 

Manually selecting sufficient and minimal builders is not an easy task: it 
requires a thorough analysis of the available routines and a deep understanding 
of the program semantics. This is especially hard for programs with rich APIs, 
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where there are many routines and a lot of redundancy in the API (see Sect. 2). 
We propose an automated approach for identifying such a sufficient and minimal 
set of builders, based on an evolutionary algorithm that searches for a minimal 
set of routines that is capable of generating the maximum number of different 
(bounded) objects (i.e., BE(M,k)). Moreover, our evolutionary approach also 
takes into account other characteristics of the builders, such as the number 
and complexity of their parameters, so that “simpler” routines are favored in 
the search. The goal is to choose builders that can be more easily and more 
efficiently used by the subsequent program analyses. 

The fitness value for a set of routines R is based on the number of bounded 
structures that can be generated using combinations of these routines. To com- 
pute the fitness we use a modified version of a random test case generation tool 
(Randoop [24]) to generate as many bounded structures as possible from R, 
allowing at most k of objects of each type in the structures (a parameter to our 
algorithm). As sets of objects are very expensive to maintain and manipulate, 
both in terms of space and running time, we employ an efficient abstraction of a 
set of objects, called field extensions, defined as the set of field values appearing 
in any of the objects in the set [25]. Thus, instead of counting the number of 
different objects achieved by a candidate, the fitness function will compute the 
field extensions as objects are generated, and return the number of field values in 
the extensions. Intuitively, a higher number of field values in the field extensions 
means that the builders can be used to construct a more diverse set of objects, 
and therefore they should be preferred over other sets of builders. 

We assess our approach experimentally on a benchmark of stateful Java 
classes drawn from the literature. The results show that in our case studies our 
approach identifies sets of routines that are sufficient and minimal, in a reason- 
able time. We also assess the impact of our approach in an automated analysis, 
namely, in the generation of test cases for parameterized tests. We compare how 
the random test case generation tool Randoop behaves when fed with the full 
module API, against providing the tool with only the builders identified by our 
approach. The results indicate that in the latter case Randoop generated more 
(and larger) objects, within a fixed time budget. 


2 Motivating Example 


In this section, we motivate our approach by means of a running example. The 
Apache NodeCachingLinkedList (NCL for short) [36] consists of a main circular 
doubly linked list, that holds the elements of the collection, and a secondary 
singly linked list that acts as a cache for nodes that have been removed from 
the main list. Nodes stored in the cache can be reused, and added again to 
the main list when inserting elements in the main list. Thanks to its cache, in 
applications where insertions and removals from the list are very frequent, NCL 
can significantly reduce the overhead needed for memory allocation and garbage 
collection of nodes. As an illustration, Fig. 1 shows the three NCL instances that 
can be built with exactly two nodes. 
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Fig. 1. Three NodeCachingLinkedList instances with exactly two nodes 


Table 1. Apache’s NodeCachingLinkedList API 


No.| Return type|Method name Obs?||No.|Return type|Method name Obs? 
0 NCL() no 17 |boolean isEmpty() yes 
1 NCL/(int) no 18 |Iterator iterator() no 
2 NCL (Collection) no 19 Jint lastIndexOf(Object) |yes 
3 [boolean add(Object) no 20 |ListIterator |listIterator() no 
4 [void add(int,Object) no 21 |ListIterator |listIterator(int) no 
5 |boolean addAll(Collection) no 22 |Object remove(int) no 
6 |boolean addAll(int,Collection) |no 23 |boolean remove(Object) no 
7 |boolean addFirst (Object) no 24 |boolean removeAll(Collection) |no 
8 [boolean addLast (Object) no 25 |Object removeFirst() no 
9 |void clear() no 26 |Object removeLast() no 
10 |boolean contains (Object) yes ||27 |boolean retainAll(Collection) |no 
11 |boolean containsAll(Collection)]yes ||28 |Object set (int,Object) no 
12 |boolean equals(Object) yes |/29 |int size() yes 
13 (Object get(int) yes ||30 |List subList(int,int) no 
14 [Object getFirst() yes ||31 |Object|] toArray() yes 
15 [Object getLast() yes ||32 |Object|] toArray(Object[]) yes 
16 lint indexOf(Object) yes ||33 [String toString() yes 


NCL has a very rich API, as shown in Table 1. However, for building any 
feasible NCL object only a few methods from the API suffice. For example, 
combinations of the methods in Fig.1.1, when instantiated with appropriate 
parameters, can be used to build any desired (finite) NCL object. Thus, the 
methods therein are an example of a sufficient set of builders. Notice that, after 
using the constructor, the main list of NCL can be populated just by using the 
addFirst method. However, if we want to generate instances where the cache is 
not empty, we can do so through the removeFirst method, as the sufficient set 
of builders suggests. For most automated analyses, we would like to consider as 
varying scenarios (inputs) as possible, hence the motivation to build sufficient 
sets of builders. Furthermore, the builders in Fig.1.1 are also minimal, since 
the lack of any one of them would imply that some NCL’s objects cannot be 
constructed anymore with the routines. 
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(0) NodeCachingLinkedList () 
(7) addFirst (Object) 
(25) removeFirst () 


Figure 1.1. A sufficient set of builders for NCL 


(3) add( Object) 

(4) add(int , Object) 
(7) addFirst (Object) 
(8) addLast (Object ) 


Figure 1.2. Add variants that can be used to populate NCL’s main list 


Notice that there can be many sets of sufficient and minimal builders. For 
example, we get sufficient and minimal builders by replacing addFirst in Fig. 1.1 
with any of the other add variants shown in Fig. 1.2, as for any way of filling up 
NCL’s main list with addFirst there exists a different way to build the same 
object using another add variant (perhaps invoked with different parameters and 
changing the execution order). 

We also observe that the simpler the parameters of a routine, the eas- 
ier to use the routine is for generating inputs in the context of a program 
analysis. For instance, among the alternative add routines for NCL (Fig. 1.2), 
add(int ,Object) receives more parameters than the other three methods, there- 
fore it is harder to generate parameters for it when generating inputs. This 
makes the other three alternatives preferred over it. Thus, our approach takes 
into account the number of parameters and their complexities for selecting the 
best possible builders. 

Many methods in Table 1 are marked as observers (column Obs?), meaning 
that they do not modify the objects they operate on, nor they are useful for 
creating non-primitive objects. Hence, observers are always superfluous, and 
should never be included in a set of minimal builders. Our approach tries to 
recognize them beforehand, and discards them from the search to significantly 
reduce the search space. 

To conclude this section we remark that, when fed with the whole NCL’s API, 
our approach automatically identified the sufficient and minimal set of builders 
for NCL shown in Fig. 1.1. 


3 Background 


3.1 Field Extensions 


The idea behind field extensions [25] is to define a representation for a set of 
objects that is smaller in size and easier to manipulate algorithmically. This 
representation implies some loss of information, but for certain applications (like 
the one in this paper) they are precise enough to be useful in practice [1,12, 25, 
26, 29]. 
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head = (Lo, null), (Lo, No) 

cache = (Lo, null), (Lo, Ni), (Lo, No) 

next = (No, N1), (Ni, No), (No, No), (Ni, null) 
prev = (No, M1), (No, No), (M1, null), (No, null) 


Figure 1.3. Field extensions for the set of instances in Fig. 1 


Given a set S of objects, its field extensions representation consist of a set 
of pairs for each field f, such that (obj,val) belongs to the field extensions of f if 
obj.f = val (i.e., the value of f for obj equals to val), for some object obj in S. 
As an example, consider the instances displayed in Fig. 1. Its corresponding field 
extensions are shown in Fig. 1.3. We omit the values stored in the nodes for the 
sake of clarity. Notice that structure (a) in Fig. 1 can be built using only add 
methods, whereas for (b) and (c) we have to also employ some kind of remove 
operation, to move nodes from the main list to the cache. Notice that values 
(Lo, No) and (Lo, N1) for the cache field only appear in the field extensions when 
the structures have nodes in the cache, like (b) and (c). In addition, prev fields of 
nodes in the cache are always null, but prev fields can never be null in the main 
list (due to its circularity). This means that field extensions for structures that 
have non-empty caches have the potential of having a larger number of values 
than those for structures with no caches. 

It is important to canonicalize structures before computing field exten- 
sions [12]. Canonicalization involves assigning unique identifiers No, Ni,... to 
each of its nodes during a traversal of the structure (we employ a breadth first 
traversal), starting at the root. Nodes visited first receive smaller identifiers than 
those visited afterwards during the traversal. Fields must be visited in a fixed 
order. Note that structures in Fig. 1 are all in canonical breadth-first form. 


3.2 Random Test Case Generation 


Random test generation consists of randomly producing inputs in order to test 
software [8,21,24]. Random input generation is straightforward when consider- 
ing basic (numeric) data types, but producing inputs of other more complex 
types, in particular instances of stateful classes, is less obvious and calls for a 
more complex mechanism, other than just using random number generators. One 
such mechanism, that has been implemented by various tools for random test 
generation for object-oriented code, is based on randomly combining method 
sequences, that produce inputs of different types [8, 21,24]. The process associ- 
ated with the Randoop tool [24] that we use here, works essentially as follows. 
For every datatype, a set of sequences that produce inputs of such datatype, is 
maintained. To start with, for basic data types, a set of initial values is consid- 
ered, and for class types, only null is considered at first (these can be considered 
test sequences of size one). The procedure to build a new test sequence starts 
by randomly selecting a method m, among all methods in the software under 
test. For example, one could randomly choose one of the methods for the NCL’s 
API (Table 1), say add(Object). To actually build the test sequence, values for 
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each of the parameters of the method m, of the corresponding types, have to 
be provided. These are obtained by randomly selecting test sequences, from the 
sets of sequences of the corresponding types, and sequentially composing them, 
with method m as a last statement. As an example, say that a sequence con- 
taining only the constructor of NCL is randomly selected, from the available 
sequences for the NCL type, and for the parameter of add, an Integer with value 
0 is randomly chosen. Combining all these sequences together results in: 


NodeCachingLinkedList 1 = NodeCachingLinkedList (); 
l.add(new Integer (0)); 


This new sequence can now be stored for later use a as parameter for other 
methods that operate on NCL objects. 

This process is repeated until either a time budget is exhausted, or the desired 
number of tests (set by the user) is generated. Randoop uses guidance from the 
execution of tests to avoid generating illegal tests. We refer the interested reader 
to the article introducing Randoop [24], for further details. 

An important issue to remark here is that the execution of each test sequence 
generated by Randoop produces a number of objects for the given type (NCL in 
the example). We exploit this characteristic of Randoop to compute the fitness 
function for a set of methods, although instead of storing actual objects we will 
maintain field extensions, as we explain in more detail in Sect. 4. 


4 An Evolutionary Algorithm for Identifying Sufficient 
Object Builders 


As mentioned before, to find a sufficient set of builders from a program API we 
design a genetic algorithm, that we describe below. Genetic algorithms [14] are 
non-exhaustive guided search algorithms, based on a hill climbing strategy [30]. 
The search space is composed of a generally very large set of individuals (the 
candidates), and the search objective is to find an individual with sought-for 
features. As opposed to classic search algorithms, genetic algorithms maintain 
a set of individuals, called the population, and search progresses by iteratively 
selecting a number of individuals in the population, using these for evolution 
(building new individuals out of these), and leaving out some individuals of the 
whole set (the “old” ones and the “new” ones). Selection of individuals for popu- 
lation evolution, as well as individuals’ removal, are guided by a fitness function, 
the heuristic function used to guide the search. This function applies to individ- 
uals, and its result is generalizable to the population too (e.g., the fitness of the 
population may be taken as the fitness of its “fittest” individual). This function 
captures the features sought for in the search, and thus can be used as a halting 
criterion (e.g., algorithm stops after finding an individual with fitness above a 
certain threshold). Finally, individuals are often called chromosomes, and repre- 
sented as vectors of genes that capture their characteristics. This idea is strongly 
related to how new individuals are constructed: by representing candidates as 
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vectors of independent characteristics, one can build new candidates by combin- 
ing part of the characteristics of an individual with part of the characteristics of 
another, or by arbitrarily changing a characteristic of a given individual. These 
two forms of evolution are called crossover and mutation, respectively, and are 
the traditional mechanism to build new candidates out of existing ones in genetic 
algorithms. For further details, we refer the reader to [22]. 


4.1 Chromosome Representation 


In the context of our problem, candidate solutions represent sets of methods 
from the API of the module being analyzed. We then employ vectors of boolean 
values as chromosome representation. Let n be the number of methods in the 
API; the chromosomes in our algorithm will be vectors of size n. For any vector, 
the i-th position is true if and only if the chromosome contains the i-th method 
of the API. For example, there are 34 methods in the NCL’s API (Table 1), 
and we enumerated them from 0 to 33. The sufficient set of builders in Fig. 1.1 
is characterized by the vector with positions 0, 7 and 25 set to true, and the 
remaining positions set to false. In this case, the whole search space consists of 
the 234 possible chromosomes. 


4.2 Fitness Function 


Given a chromosome representing a set of methods M, our fitness function com- 
putes an approximation of the number of bounded objects that can be built 
using combinations of methods in M. Chromosomes with higher fitness values 
are estimated to build more objects than those that have smaller fitness values. 

Ideally, we would like to explore all the feasible objects within a small 
bound k, that can be built using the methods of the current chromosome, i.e., 
BE(M,k). In other words, we need a bounded exhaustive generator for the set 
of methods. The bound k represents the maximum number of objects that can 
be created for each class (in Fig.1, the number of nodes in the NCL objects 
are bounded by k = 2), and the maximum number of primitive values available 
(for example, integers from 0 to k — 1). For this purpose, we developed a proto- 
type modifying the Randoop tool, discussed briefly in Sect. 3.2. First, we altered 
Randoop to work with a fixed set of primitive values (integers from 0 to k — 1). 
(Normally, Randoop would save primitive values that are returned by the execu- 
tion of tests, and reuse these values in future tests.) Second, we make Randoop 
drop sequences of methods that create objects with more than k objects (of any 
type), to stop it from building objects larger than needed. To achieve this, we 
canonicalize the objects generated by the execution of each sequence, and we 
discard the sequence if some object has an index equal or larger than k. Third, 
we extend Randoop with “global” field extensions, and when the execution of a 
sequence terminates all the field values of the objects generated by the sequence 
are added to the field extensions. For example, if Randoop had generated the 
objects in Fig.1, then the global field extensions would have the values shown 
in Fig. 1.3. Our goal is that, given a bound k, when our modified version of 
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(0) NodeCachingLinkedList () 
(7) addFirst (Object) 

(8) addLast (Object) 

(25) removeFirst () 


Figure 1.4. A set of sufficient but not minimal builders for NCL 


(0) NodeCachingLinkedList () 
(4) add(int , Object) 
(23) remove( Object ) 


Figure 1.5. Sufficient and minimal builders for NCL with more complex parameters 
than the ones in Fig. 1.1 


Randoop terminates the global field extensions contain all the field values of the 
bounded exhaustive set of structures with up to k nodes, BE(M,k). The result 
of the fitness function for the chromosome is the number of field values in the 
global extensions computed by the tool. 

Our rationale for using bounded sets of objects is akin to the small scope 
hypothesis for bug finding [2]: if one set of methods cannot be used to build 
small objects that allow to differentiate it from another set of methods, then 
it is unlikely that these two sets can be distinguished with larger objects. This 
hypothesis held during our empirical evaluation across all our case studies. 

We found that, besides being affected by chance, our tool rarely misses build- 
ing objects that should add relevant values to the global extensions, when small 
values for k are employed. 


Choosing Better Sets of Builders. In this section, we propose two ways to 
improve our evolutionary algorithm by tailoring the fitness function to obtain 
better sets of builders. This is strongly motivated by the way builders are used 
to build inputs in program analysis. On the one hand, if we have two sufficient 
set of builders, the set with the smaller number of methods should always be 
preferred. In this context, there is no reason to include superfluous methods in 
builders. For example, the builders in Fig.1.4 can be used to create the same 
NCL objects as the builders in Fig. 1.1 of Sect. 2 (both sets are sufficient), but 
they are not minimal since addLast is superfluous. 

On the other hand, builders with more parameters, or more complex ones, 
are more taxing on program analysis, as they require more effort to be ade- 
quately instantiated. Thus, we define a simple criterion of parameter complexity 
and adapt our fitness to favor builders with simpler parameters over the more 
complex ones. For example, both sets of builders in Figs. 1.1 and 1.5 are sufficient 
and minimal (with 3 routines each), but builders in Fig. 1.5 have more param- 
eters that need to be instantiated. Comparing Figs.1.1 and 1.5 we can observe 
that addFirst has been replaced by add, which has an additional integer param- 
eter, and that removeFirst was interchanged with remove, which possesses a 
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non-primitive parameter of type Object. Following the criteria explained above, 
we would like our algorithm to choose the set in Fig. 1.1 over that of Fig. 1.5. 
Incorporating these ideas, the fitness function of our approach is defined by: 


f (M) = #fieldExt (M) + 


#M (#PP(M)+w3*RP(M)) 
wi * (1 = fit) + w2 * (1 rs) 


W 1 + w2 


For a chromosome representing a set M of methods, drawn from the whole set 
of available methods of the API, MT, the most important part of the fitness for 
M, is the number of values in the field extensions, # fieldExt(M), that can be 
generated using our custom Randoop tool as explained in the previous section. 
The summand on the right implements the ideas presented in this section. It 
returns a real value in the interval [0, 1] that is useful to break ties for sets 
of methods that generate field extensions with the same number of values. In 
the dividend, the first summand penalizes sets with larger numbers of methods, 
by computing the quotient of the number of methods in M to the number of 
methods in MT, and subtracting the result to 1. Constant wı (w; > 1) allows 
us to increase/decrease the weight of this summand with respect to the other 
summand. The second summand in the dividend penalizes sets of methods with 
more complex parameters. Similarly to w1, constant w2 (w2 > 1) serves the pur- 
pose of increasing/decreasing the weight of this factor in the sum. Notice that 
we sum up the parameters differently depending on their types: each primitive 
parameter adds 1 (PP(M) is the number of primitive parameters in the methods 
of M), and each reference parameter adds a constant w3 (w3 > 1, RP(M) is the 
number of reference-typed parameters in the methods of M), which allows us to 
increase the weight of reference parameters with respect to primitive ones. Intu- 
itively, the whole right-hand summand computes the ratio between the number 
of parameters of M (with added weight for reference parameters) to the number 
of (weighted) parameters for MT. The result is then subtracted from 1. Finally, 
we divide by wı + w2 to obtain the desired number in the interval [0, 1]. 

In our experimental assessment we set w , = 2, w2 = 1,w3 = 2. These values 
were good enough for our approach to produce sufficient and minimal sets of 
builders in all our case studies. 

It is important to remark that the presented criteria for choosing better 
builders is based on the kind of program analyses we target (generation of tests 
cases for parameterized tests, software model checking). New criteria can be 
defined with other goals in mind, and our approach can be adapted to support 
them by modifying the fitness function as we did in this section. 


4.3 Overall Structure of the Genetic Algorithm 


The previously described elements are the constituting parts of the genetic algo- 
rithm implementing our approach. A pseudocode of the genetic algorithm is 
shown in Algorithm 1. Notice that Algorithm 1 follows the general structure of 
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Algorithm 1. Genetic Algorithm implementing our approach 
1: pop — chromosomes with exactly one true gene 
2: for i = 1...numEvo do 
3: pop — keep the popSize fittest chromosomes from pop 


4 for j = 1...cRate x popSize do 

5 cl,c2 <— select two random chromosomes from pop 
6: new < single point crossover cl, c2 

7 add new to pop 

8 end for 

9: for c € pop do 
10: new +— mutate each gene of c with probability mRate 
11: if new Æ c then 
12: add new to pop 
13: end if 
14: end for 
15: end for 


16: result — fittest chromosome of pop 


a genetic algorithm. The initial population is generated by producing all the 
feasible chromosomes with only one available method (vectors with false in all 
positions except one, set to true) (line 3). Then, it starts to iteratively evolve 
the population (lines 4-15). At the beginning of each evolution iteration, the 
algorithm discards some individuals to control population size, by keeping the 
popSize fittest individuals of the current population and discarding the rest (line 
5). Then, the algorithm performs single-point crossover on randomly selected 
individuals (lines 6-10). Crossover is applied a number of times that is propor- 
tional to the population size popSize, determined by the product of popSize 
and the crossover rate parameter cRate (0 < cRate < 1). Then, the algorithm 
mutates individuals (lines 11-15) by changing the value of each of its genes 
with probability mRate (0 < mRate < 1). Any newly created individual by the 
crossover and mutation operations are added to the population. 

The algorithm stops after numEvo evolutions, with numEvo a parameter of 
the algorithm. Notice that, we don’t have a target value for our fitness, since an 
untried set of methods might produce a larger number of field extensions than 
the algorithm has currently seen. Again, there is a compromise to be made for 
choosing a good value for numEvo: a larger number increases the precision of 
the algorithm but increases its running time, whereas a smaller number makes 
it run faster but it might not result in the best set of builders. 

As usual, we found a number for the parameters of our algorithm that seems 
to work well in practice. In our experimental evaluation, we set numEvo = 
20, popSize = 30,cRate = 0.35, mRate = 0.08 (the last two are the default for 
the JGap library). 

Most of Algorithm 1 is a default evolutionary implementation of the JGap 
Java library [37]. Notice that, if we take away the complexity of the fitness func- 
tion, our evolutionary algorithm is rather standard, so it is not surprising that 
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an existing implementation works well for our purposes. Of course, improve- 
ments to the evolutionary algorithm, and fine tuning for its parameters (e.g., 
crossover/mutation rate) might yield faster execution times. 

We also implemented a simple multi-threaded version of our approach, that 
helps improving its performance. Basically, at each iteration we make t copies of 
the current population, where t is the number of available threads, and evolve 
each of the population replicas independently of the others. After all the threads 
have finished, we keep the 100/t fittest individuals of the population evolved by 
each thread, and use them to build the population for the next iteration of the 
algorithm. 


4.4 Reducing the Search Space by Observers Classification 


We say a routine is an observer if it never modifies the parameters it takes, 
and never generates a non-primitive value as a result of its execution. Column 
Obs? in Table 1 (Sect.2) indicates whether each NCL method is an observer 
or not. Clearly, an observer cannot be used to modify nor build new objects, 
and therefore can never belong to a minimal set of builders. Hence, if we can 
classify them correctly beforehand, we can remove the observers from the search 
to significantly reduce the search space, without losing precision. For example, in 
the NCL API (Table 1) there are 13 observers out of 34 methods, so by removing 
observers we prune more than one third of the search space. 

To detect observers we run another customized Randoop version before our 
evolutionary algorithm. This time, we check for each method whether it modifies 
its inputs at each test sequence generated by Randoop involving the method, 
by canonicalizing the objects before and after execution of the method, and 
checking if the field values of the objects change after execution. If this is the 
case, the method is marked as a builder (not an observer). For return values, if 
in any test sequence generated by Randoop the method returns a non-primitive 
value, then we mark it as a builder as well. We run this custom Randoop until 
it generates a large number of scenarios for each method. Ten to twenty seconds 
was enough for our case studies. At the end of the Randoop execution, methods 
not marked as builders are considered observers and discarded before invoking 
the evolutionary algorithm. 

Other approaches exist for the detection of pure methods [15,31] (similar to 
our observers). Note that our evolutionary algorithm is not dependent on the 
method classification algorithm, so any of them could be useful for our purposes. 


5 Experimental Results 


In this section, we experimentally assess our approach. The evaluation is based on 
a benchmark of data structure implementations, including: NCL from Apache Col- 
lections [36]; BinaryTree, BinomialHeap, FibonacciHeap, RedBlackTree taken 
from [35]; UnionFind, an implementation of disjoint sets taken from JGrapht 
[38]. We also evaluate our technique on components of real software projects 
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such as Lits from the implementation of Sat4j [3], taken from [20], which con- 
sists of a variable store that monitors when a guess was last made about a value 
of a variable, and whether listeners are watching the state of that variable; and 
Scheduler, an implementation of a process scheduler taken from [10]. All the 
experiments were run on 3.4GHz quad-core Intel Core i7-6700 machines with 
8GB of RAM, running GNU/Linux. 

The evaluation consists of two parts. First, we ran our approach (Algorithm 1) 
on the whole module APIs of the aforementioned classes, to compute sets of 
builders for each case study. The goal is to assess how good are the builders 
identified, and the time it takes our approach to compute them. For each case 
study we ran our approach 5 times. The results are shown in Table 2, includ- 
ing the number of routines in the whole API (#API), a sample of identified 
builders (some methods might be interchanged in different runs, e.g., addFirst 
and addLast in NCL), and the average running time (in seconds) of the 5 runs. 
We manually inspected the results, and found that the automatically identi- 
fied sets of builders were in all cases sufficient (all the feasible objects for the 
structure can be constructed using the builders) and minimal (do not contain 
superfluous methods). The approach is reasonably efficient, taking about 30 min 
in the worst case. 

The second part of the evaluation regards how helpful are the identified 
builders in the context of a program analysis, namely, the automated generation 
of test cases. These objects might be used, for example, as inputs in parameter- 
ized unit tests. For the case studies that provide mechanisms to measure the size 
of objects and to compare objects by equality (i.e., the size and equals methods 
of data structures), we generated tests with Randoop using all the methods avail- 
able in the API (API), and then we generated tests with Randoop using only the 
builder methods (BLD) identified by our approach in the previous experiment 
(Table 2). We then compare the number of different objects (No. of Objs.), and 
the size of the largest object (Max Obj. Size) created by the tests generated from 
the API, against the tests generated using methods from BLD only. We set three 
different test generation budgets: 60, 120 and 180 seconds (Budget). The results 
are summarized in Table3. In addition, we consider another approach, API+, 
that involves the generation of tests using the API for a budget that encom- 
passes the test generation budget (Budget) plus the time it takes our approach 
to identify builders for the corresponding case study. The results show that in 
the same test budget BLD generates in average 1280% more objects than API. 
Furthermore, when builders identification time is added to the test generation 
budget for API (API+), BLD can generate 568% more objects in average (w.r.t 
API-+). In all cases, BLD also generates significantly larger objects than API and 
API-+. In view of these results, it is clear that automated builders identification 
pays off for the automated generation of structures for stateful classes. 

The experiments can be reproduced by following the instructions in the paper 
website [27]. Furthermore, in the site we experimentally show that the builders 
identified by our approach can be employed to build efficient drivers for software 
model checking. We don’t show these results here due to space constraints. 
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Table 2. Builders computation Table 3. Assessment of using the identified 
results builders (BLD) vs the whole API (API) in test case 
generation. API+ involves test case generation with 


NET TERO Le ija whole API, with budget = (Budget + builders 
addFirst(Object) 1744, Computation time) 
#API: 34/removeFirst() 
UFind UnionFind() Budget} Max Obj. Size No. of Objs. 
addElement (int) 215 API BLD API+| API BLD API+ 
#API: 9 |union(int,int) NCL 60| 8 6 11 1442 42021 13119 
FHeap _|FibonacciHeap() #API: 34 120| 8 8 11 | 2423 69017 13247 
insert (int) 72 #BLD: 3 180| 9 8 11 | 3166 91647 13505 
#API: 7 |removeMin() UFind 60| 8 3 9 3388 34250 8351 
RBT TreeMap() #API: 9 120| 9 3 9 5180 56418 8574 
#API: 8 |put(int) 73 #BLD: 3 180| 9 3 9 6695 74425 9387 
BtTree BinTree() FHeap 60} 11 5 12 | 6989 32639 11499 
#API: 7 |add(int) 73 #API: 7 120| 12 7 13 |11447 54264 17202 
BHeap | BinomialHeap() #BLD: 3 180| 12 7 13 |15344 72413 20775 
#API: 10/insert (int) 121 RBT 60| 8 5 8 1812 23034 3041 
Lits Lits() #API: 8 120| 8 5 8 2678 35635 3698 
#API: 26|getFromPool(int) ##BLD: 2 180| 8 5 8 3358 44807 3940 
forgets(int) 1229 BTree 60| 8 5 8 3600 24908 6019 
setLevel(int,int) #API: 7 120| 8 5 8 5471 39239 7387 
set Reason(int) #BLD: 2 180| 8 5 9 |6975 50671 9247 
Sched. |Schedule() BHeap 60| 9 26 10 | 3874 65915 8076 
#API: 10\addProcess(int) #API: 10 120| 10 29 10 |5970 111402 9708 
blockProcess() 377 #BLD: 2 180| 10 29 11 |7638 147260 10606 
quantumExpire() 


6 Related Work 


As mentioned throughout the paper, the problem of identifying sufficient builders 
is recurrent in various program analyses, including but not limited to software 
model checking and test generation. In works like [18,23], in the context of 
software model checking, and [5,24,32,33], in the context of automated test 
generation, and just to cite a few, the problem of identifying part of an API and 
provide it for analysis is present. Typically the problem is dealt with manually. 

The use of search-based techniques to solve challenging software engineering 
problems is an increasingly popular strategy, which has been applied successfully 
to a number of problems, including test input generation [11], program repair 
[19], and many others. As far as we are aware of, this is a novel application of 
evolutionary computation in software engineering. An approach that tackles a 
related, but different, problem, is that associated with the SUSHI tool [5]. The 
aim with SUSHI is to feed a genetic algorithm with a path condition, produced 
by a symbolic execution engine, so that an input satisfying the provided path 
condition can be reproduced using a module’s API. This approach assumes that 
the API (or the subset of relevant methods) is provided, as opposed to our work, 
that precisely tackles the provision of the restricted API. 

Our technique requires a mechanism for identifying observers, which we 
have solved within the work in the paper, resorting to random test generation, 
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and instrumentation for state monitoring. Approaches to the identification of 
observers, or more precisely pure methods, exist in the literature [15,31]. Regard- 
ing these lines of work, notice that the focus of our evolutionary algorithm is not 
the identification of observers, but the construction of minimal and sufficient set 
of builders. Moreover, our approach is in fact independent of the mechanism used 
to identify observers/pure methods, and thus could be combined with the works 
just cited (i.e., replacing our random testing based approach by an alternative 
one). 


7 Conclusions 


In this work, we presented an evolutionary algorithm for automatically detecting 
sets of builders from a module’s API. We assessed our algorithm over several case 
studies from the literature, and found that it is capable of precisely identifying 
sets of builders that are sufficient and minimal, within reasonable running times. 
To the best of our knowledge, this is the first work that addresses this problem, 
which is typically dealt with manually. 

We also showed preliminary results indicating that our approach can be 
exploited by test case generation tools to yield larger and more diverse objects. 
Other techniques, like software model checking, can benefit as well by using the 
identified set of builders to automatically construct efficient drivers. More exper- 
imentation needs to be done, but given the results in this paper our approach 
looks very promising. 

One of the biggest challenges of this work was the construction of a tool to 
allow us to generate all the bounded structures, for a given maximum number k 
of objects, from the methods of the program API. The proposed solution worked 
well enough for our case studies, but avoiding randomness in the process would 
be desirable. Using bounded exhaustive generation tools rather than random 
generation would better fit our purposes [4], but unfortunately none of the tools 
for bounded exhaustive test generation produce inputs from a module’s API. 
We believe that a promising research direction, that we plan to further explore 
in future work, is to adapt our presented approach for bounded exhaustive test 
generation. 

Some aspects of our genetic algorithm can be further improved. For instance, 
a more powerful classification for argument types, in the prioritization of meth- 
ods according to their complexities, can be defined. Moreover, one may also 
incorporate other dimensions, such as code complexity, to favor simpler meth- 
ods. We will explore this direction as future work. Also, our genetic algorithm 
implementation is, for most parts, a default evolutionary implementation of the 
JGap Java library [37]. Of course, improvements to the evolutionary algorithm, 
and fine tuning for its parameters (e.g., crossover/mutation rate) might yield 
faster execution times, so we plan to investigate this further in future work. 
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